Analyses of Tabular AlphaZero on Strongly-Solved Stochastic Games

Chu-Hsuan Hsueh; Kokolo Ikeda; I-Chen Wu; Jr-Chang Chen; Tsan-Sheng Hsu

doi:10.1109/access.2023.3246638

IEEE Access (Jan 2023)

Analyses of Tabular AlphaZero on Strongly-Solved Stochastic Games

Chu-Hsuan Hsueh,
Kokolo Ikeda,
I-Chen Wu,
Jr-Chang Chen,
Tsan-Sheng Hsu

Affiliations

Chu-Hsuan Hsueh: ORCiD; School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Kokolo Ikeda: Department of Computer Science and Information Engineering, National Taipei University, New Taipei City, Taiwan
I-Chen Wu: ORCiD; Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Jr-Chang Chen: ORCiD; Department of Computer Science and Information Engineering, National Taipei University, New Taipei City, Taiwan
Tsan-Sheng Hsu: Institute of Information Science, Academia Sinica, Taipei, Taiwan

DOI: https://doi.org/10.1109/access.2023.3246638
Journal volume & issue: Vol. 11
pp. 18157 – 18182

Abstract

Read online

The AlphaZero algorithm achieved superhuman levels of play in chess, shogi, and Go by learning without domain-specific knowledge except for game rules. This paper targets stochastic games and investigates whether AlphaZero can learn theoretical values and optimal play. Since the theoretical values of stochastic games are expected win rates, not a simple win, loss, or draw, it is worth investigating the ability of AlphaZero to approximate expected win rates of positions. This paper also thoroughly studies how AlphaZero is influenced by hyper-parameters and some implementation details. The analyses are mainly based on AlphaZero learning with lookup tables. Deep neural networks (DNNs) like the ones in the original AlphaZero are also experimented and compared. The tested stochastic games include reduced and strongly-solved variants of Chinese dark chess and EinStein würfelt nicht!. The experiments showed that AlphaZero could learn policies that play almost optimally against the optimal player and could learn values accurately. In more detail, such good results were achieved by different hyper-parameter settings in a wide range, though it was observed that games on larger scales tended to have a little narrower range of proper hyper-parameters. In addition, the results of learning with DNNs were similar to lookup tables.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords