2nd International Workshop on the Numerical Solution of Markov Chains, North-Carolina, United States Of America, 18 January 1995, pp.283-306
We consider a Markov decision process (MDP) with finite state space S and finite action set A. The state space is partitioned into K sets S-1, S-2, ..., S-K. A stationary randomized policy is described by the parameters {alpha(ia), i is an element of S, a is an element of A}, where alpha(ia) = the parobability that action a is taken when the system is in state i. A policy is called implementable if alpha(ia) = alpha(ja) for all a is an element of A whenever states i and j belong to a common subset S-r for some r = 1, 2,..., K. In this paper we discuss the importance of implementable policies and present an algorithm to find implementable policies that (locally) minimize the expected total discounted cost over infinite horizon.