Reinforcement Learning Based Adaptive Blocklength and MCS for Optimizing Age Violation Probability


Ozkaya A., Topbas A., Ceran Arslan E. T.

IEEE Access, cilt.11, ss.122411-122425, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 11
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1109/access.2023.3326748
  • Dergi Adı: IEEE Access
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.122411-122425
  • Anahtar Kelimeler: adaptive modulation and coding, Age of information, dynamic programming, finite blocklength, reinforcement learning
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

As a measure of the freshness of data, Age of Information (AoI) has become an essential performance metric in status update applications with stringent timeliness constraints. This study employs adaptive strategies to minimize the novel, information freshness-based performance metric age violation probability (AVP), the probability of the instantaneous age exceeding a predefined constraint, in short packet communications (SPC). AVP can be considered one of the key performance indicators (KPIs) in 5G Ultra-Reliable Low Latency Communications (URLLC), and it is expected to gain more importance in 6G technologies, especially in extreme URLLC (xURLLC). Two distinct approaches are considered: The first focuses on adaptively selecting the blocklengths with either imperfect or missing channel state information exploiting finite blocklength theory approximations. The second involves dynamically choosing the modulation and coding scheme (MCS) to minimize the AVP under stringent timeliness constraints and non-Asymptotic information theory bounds. In the context of adaptive blocklength selection, state-Aggregated value iteration, Q-learning algorithms, and finite blocklength theory approximations are leveraged to adjust blocklengths to achieve low age violation probabilities adaptively. The simulation results highlight the effectiveness of these algorithms in minimizing age violation probabilities compared to the fixed blocklengths under varying channel conditions. Additionally, constructing a deep reinforcement learning (DRL) framework, we propose a deep Q-network policy for the dynamic selection of the modulation and coding scheme among the available MCSs defined for URLLC systems. Through comprehensive simulations, we demonstrate the superiority of the proposed adaptive methods over traditional benchmark methods.