Population-based exploration in reinforcement learning through repulsive reward shaping using eligibility traces


Bal M. I., İYİGÜN C., POLAT F., AYDIN H.

Annals of Operations Research, vol.335, no.2, pp.689-725, 2024 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 335 Issue: 2
  • Publication Date: 2024
  • Doi Number: 10.1007/s10479-023-05798-1
  • Journal Name: Annals of Operations Research
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Computer & Applied Sciences, INSPEC, Public Affairs Index, zbMATH, Civil Engineering Abstracts
  • Page Numbers: pp.689-725
  • Keywords: Coordinated agents, Eligibility traces, Population-based exploration, Reinforcement learning, Reward shaping
  • Middle East Technical University Affiliated: Yes

Abstract

Efficient exploration plays a key role in accelerating the learning performance and sample efficiency of reinforcement learning tasks. In this paper we propose a framework that serves as a population-based repulsive reward shaping mechanism using eligibility traces to enhance the efficiency in exploring the state-space under the scope of tabular reinforcement learning representation. The framework contains a hierarchical structure of RL agents, where a higher level repulsive-reward-shaper agent (RRS-Agent) coordinates the exploration of its population of sub-agents through repulsion when necessary conditions on their eligibility traces are met. Empirical results on well-known benchmark problem domains show that the framework indeed achieves efficient exploration with a significant improvement in learning performance and state-space coverage. Furthermore, the transparency of the proposed framework enables explainable decisions made by the agents in the hierarchical structure to explore the state-space in a coordinated manner and supports the interpretability of the framework.