Population-based exploration in reinforcement learning through repulsive reward shaping using eligibility traces

Bal, Melis; İYİGÜN, CEM; POLAT, FARUK; AYDIN, HÜSEYİN

doi:10.1007/s10479-023-05798-1

Population-based exploration in reinforcement learning through repulsive reward shaping using eligibility traces

Atıf İçin Kopyala

Bal M. I., İYİGÜN C., POLAT F., AYDIN H.

Annals of Operations Research, cilt.335, sa.2, ss.689-725, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 335 Sayı: 2
Basım Tarihi: 2024
Doi Numarası: 10.1007/s10479-023-05798-1
Dergi Adı: Annals of Operations Research
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Computer & Applied Sciences, INSPEC, Public Affairs Index, zbMATH, Civil Engineering Abstracts
Sayfa Sayıları: ss.689-725
Anahtar Kelimeler: Coordinated agents, Eligibility traces, Population-based exploration, Reinforcement learning, Reward shaping
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Efficient exploration plays a key role in accelerating the learning performance and sample efficiency of reinforcement learning tasks. In this paper we propose a framework that serves as a population-based repulsive reward shaping mechanism using eligibility traces to enhance the efficiency in exploring the state-space under the scope of tabular reinforcement learning representation. The framework contains a hierarchical structure of RL agents, where a higher level repulsive-reward-shaper agent (RRS-Agent) coordinates the exploration of its population of sub-agents through repulsion when necessary conditions on their eligibility traces are met. Empirical results on well-known benchmark problem domains show that the framework indeed achieves efficient exploration with a significant improvement in learning performance and state-space coverage. Furthermore, the transparency of the proposed framework enables explainable decisions made by the agents in the hierarchical structure to explore the state-space in a coordinated manner and supports the interpretability of the framework.