Kısmi gözlemlenebilir markov karar süreçleri için budama algoritmaları.


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2017

Tezin Dili: İngilizce

Öğrenci: Selim Özgen

Danışman: MÜBECCEL DEMİREKLER

Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu

Özet:

It is possible to represent the value function in partially observable Markov decision processes as a piecewise linear function if the state, action, and observation space is discrete. Exact value iteration algorithm searches for this value function by creating an exponential number of linear functions at each step, many of which can be pruned without changing the value of the value function. The pruning procedure is made possible by the use of linear programming. This study first gives a geometric framework of the pruning procedure. It shows that the linear programming iterations refer to the selection of different convex regions in the vector space representation of the pruning problem. We also put forward an algebraic framework, which is the utilization and maintenance of linear programs. It shows how the problem can be decomposed into small sized LPs and what the LP iterations refer to. While stating these two theoretical frameworks, their relations have also been exploited. The exponential increase in the number of vectors in any step of the exact value iteration algorithm is due to an operation called the cross-sum addition of a set of vectors. This operation results in a new set of vectors. It is known that for any of the summand vectors in this new set to be non-dominated, the addend vectors entering the cross-sum addition should have intersecting support sets. The given geometric and algebraic framework has further been extended to exploit this particular property of the cross-sum operation. Two novel pruning algorithms have been offered in this study. First algorithm, called FastCone, can be used for pruning any given set of vectors. For a given set of clean vectors at any step, the algorithm hastily searches for the convex region that a dirty vector is in and tries to find a clean vector if only the given set of clean vectors is not sufficient to make the decision about this dirty vector. The second algorithm is called Cross-Sum Pruning with Multiple Objective Functions, where the aim is to find the vectors that have non-intersecting support sets with the current active vectors in each simplex iteration. This approach is useful because when two vectors from two different sets with non-intersecting support sets are detected, it is possible to delete all ordered pairs containing these two vectors. And this amounts to a simple sign check of the coefficients of a row of the simplex tableau. To show the algorithms' performance, both algorithms have been compared to the conventional algorithms and their revised versions both analytically and experimentally.