A Reinforcement Learning Approach to Age of Information in Multi-User Networks With HARQ

Ceran, ELİF; Gunduz, Deniz; Gyorgy, Andras

doi:10.1109/jsac.2021.3065057

A Reinforcement Learning Approach to Age of Information in Multi-User Networks With HARQ

Atıf İçin Kopyala

Ceran E. T., Gunduz D., Gyorgy A.

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, cilt.39, sa.5, ss.1412-1426, 2021 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 39 Sayı: 5
Basım Tarihi: 2021
Doi Numarası: 10.1109/jsac.2021.3065057
Dergi Adı: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, MLA - Modern Language Association Database, zbMATH, Civil Engineering Abstracts
Sayfa Sayıları: ss.1412-1426
Anahtar Kelimeler: Protocols, Optimal scheduling, Standards, Automatic repeat request, Receivers, Time-varying systems, Information age, Age of information, hybrid automatic repeat request (HARQ), constrained Markov decision process, reinforcement learning, Whittle index
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Scheduling the transmission of time-sensitive information from a source node to multiple users over error-prone communication channels is studied with the goal of minimizing the long-term average age of information (AoI) at the users. A long-term average resource constraint is imposed on the source, which limits the average number of transmissions. The source can transmit only to a single user at each time slot, and after each transmission, it receives an instantaneous ACK/NACK feedback from the intended receiver, and decides when and to which user to transmit the next update. Assuming the channel statistics are known, the optimal scheduling policy is studied for both the standard automatic repeat request (ARQ) and hybrid ARQ (HARQ) protocols. Then, a reinforcement learning (RL) approach is introduced to find a near-optimal policy, which does not assume any a priori information on the random processes governing the channel states. Different RL methods including average-cost SARSA with linear function approximation (LFA), upper confidence reinforcement learning (UCRL2), and deep Q-network (DQN) are applied and compared through numerical simulations.