Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

Aydin, HÜSEYİN; Çilden, Erkin; Polat, FARUK

doi:10.1016/j.future.2022.03.016

Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

Aydin H., Çilden E., Polat F.

Future Generation Computer Systems, cilt.133, ss.153-168, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 133
Basım Tarihi: 2022
Doi Numarası: 10.1016/j.future.2022.03.016
Dergi Adı: Future Generation Computer Systems
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
Sayfa Sayıları: ss.153-168
Anahtar Kelimeler: Reinforcement learning, Task decomposition, Chains of bottleneck transitions
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

© 2022 Elsevier B.V.Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each sub-policy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution.