A context aware model for autonomous agent stochastic planning

Ekmekci, ÖMER; POLAT, FARUK

doi:10.1016/j.robot.2018.11.013

A context aware model for autonomous agent stochastic planning

Ekmekci O., POLAT F.

ROBOTICS AND AUTONOMOUS SYSTEMS, cilt.112, ss.137-153, 2019 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 112
Basım Tarihi: 2019
Doi Numarası: 10.1016/j.robot.2018.11.013
Dergi Adı: ROBOTICS AND AUTONOMOUS SYSTEMS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.137-153
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Markov Decision Processes (MDPs) are not able to make use of domain information effectively due to their representational limitations. The lacking of elements which enable the models be aware of context, leads to unstructured representation of that problem such as raw probability matrices or lists. This causes these tools significantly less efficient at determining a useful policy as the state space of a task grows, which is the case for more realistic problems having localized dependencies between states and actions. In this paper, we present a new state machine, called Context-Aware Markov Decision Process (CA-MDP) based on MDP for the purpose of representing Markovian sequential decision making problems in a more structured manner. CA-MDP changes and augments MDP facilities by integrating causal relationships between actions and states thereby enabling structural, hence compact if possible, representation of the tasks. To show the expressive power of CA-MDP, we give the theoretical bounds for complexity of conversion between MDP and CA-MDP to demonstrate the expressive power of CA-MDP. Next, to generate an optimal policy from CA-MDP encoding by exploiting those newly defined facilities, we devised a new solver algorithm based on value iteration (VI), called Context-Aware Value Iteration (CA-VI). Although regular dynamic programming (DP) based algorithms is successful at effectively determining optimal policies, they do not scale well with respect to state-action space, making both the MDP encoding and related solver mechanism practically unusable for real-life problems. Our solver algorithm gets the power of overcoming the scalability problem by integrating the structural information provided in CA-MDP. First, we give theoretical analysis of CA-VI by examining the expected number of Bellman updates being performed on arbitrary tasks. Finally, we present our conducted experiments on numerous problems, with important remarks and discussions on certain aspects of CA-VI and CA-MDP, to justify our theoretical analyses empirically and to assess the real performance of CA-VI with CA-MDP formulation by analysing the execution time by checking how close it gets to the practical minimum runtime bound with respect to VI performance with MDP encoding of the same task. (C) 2018 Elsevier B.V. All rights reserved.