Modular Reinforcement Learning for Playing the Game of Tron

Mingi Jeon; Jay Lee; Sang-Ki sKo

doi:10.1109/ACCESS.2022.3175299

IEEE Access (Jan 2022)

Modular Reinforcement Learning for Playing the Game of Tron

Mingi Jeon,
Jay Lee,
Sang-Ki sKo

Affiliations

Mingi Jeon: Department of Computer Science and Engineering, Kangwon National University, Chuncheon, South Korea
Jay Lee: ORCiD; Data & Investment Division, Hana Bank, Seoul, South Korea
Sang-Ki sKo: ORCiD; Department of Computer Science and Engineering, Kangwon National University, Chuncheon, South Korea

DOI: https://doi.org/10.1109/ACCESS.2022.3175299
Journal volume & issue: Vol. 10
pp. 63394 – 63402

Abstract

Read online

Tron is a simultaneous move two-player game where a wall is created along the path where two agents move and the agent that crash with the wall first is defeated. Due to the fact that the same action may result in different outcomes (non-stationarity), it is difficult to utilize the basic approach of reinforcement learning. In this paper, we present a modular reinforcement learning (MRL) approach to tackling the game of Tron by decomposing the game into two phases where the first phase is non-stationary and the second phase is stationary. We train two separate models where the first model deals with the non-stationary environments such that two models move simultaneously and affect each other while the second model deals with the stationary environment when two agents are separated by walls created and cannot affect each other. We show that the latter model can be effectively pre-trained using randomly generated stationary environments. We evaluate the performance of our algorithm by comparing with previous algorithms including the state-of-the-art algorithm for the game of Tron (called a1k0n) in different grid sizes. As a result, we demonstrate that the proposed algorithm based on MRL outperforms all previous algorithms on $6 \times 6$ and $8 \times 8$ grids. Although our algorithm shows slightly worse performance on $10 \times 10$ grid than the strongest baseline a1k0n, we show that our algorithm exhibits better scalability in terms of time complexity as the grid size increases than search-based heuristics including the a1k0n.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords