TD-Gammon revisited: integrating invalid actions and dice factor in continuous action and observation space

Thesis Type: Postgraduate

Institution Of The Thesis: Orta Doğu Teknik Üniversitesi, Faculty of Engineering, Department of Computer Engineering, Turkey

Approval Date: 2018




After TD-Gammon's success in 1991, the interest in game-playing agents has risen significantly. With the developments in Deep Learning and emulations for older games have been created, human-level control for Atari games has been achieved and Deep Reinforcement Learning has proven itself to be a success. However, the ancestor of DRL, TD-Gammon, and its game Backgammon got out of sight, because of the fact that Backgammon's actions are much more complex than other games (most of the Atari games has 2 or 4 different actions), the huge action space has much invalid actions, and there is a dice factor which involves stochasticity. Last but not least, the professional level in Backgammon has been achieved a long time ago. In this thesis, the latest methods in DRL will be tested against its ancestor game, Backgammon, while trying to teach how to select valid moves and considering the dice factor.