Population-based exploration in reinforcement learning through repulsive reward shaping using eligibility traces


Bal M. I., İYİGÜN C., POLAT F., AYDIN H.

Annals of Operations Research, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1007/s10479-023-05798-1
  • Dergi Adı: Annals of Operations Research
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Computer & Applied Sciences, INSPEC, Public Affairs Index, zbMATH, Civil Engineering Abstracts
  • Anahtar Kelimeler: Coordinated agents, Eligibility traces, Population-based exploration, Reinforcement learning, Reward shaping
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Efficient exploration plays a key role in accelerating the learning performance and sample efficiency of reinforcement learning tasks. In this paper we propose a framework that serves as a population-based repulsive reward shaping mechanism using eligibility traces to enhance the efficiency in exploring the state-space under the scope of tabular reinforcement learning representation. The framework contains a hierarchical structure of RL agents, where a higher level repulsive-reward-shaper agent (RRS-Agent) coordinates the exploration of its population of sub-agents through repulsion when necessary conditions on their eligibility traces are met. Empirical results on well-known benchmark problem domains show that the framework indeed achieves efficient exploration with a significant improvement in learning performance and state-space coverage. Furthermore, the transparency of the proposed framework enables explainable decisions made by the agents in the hierarchical structure to explore the state-space in a coordinated manner and supports the interpretability of the framework.