Learning Macromanagement in Starcraft by Deep Reinforcement Learning

Wenzhen Huang; Qiyue Yin; Junge Zhang; Kaiqi Huang

doi:10.3390/s21103332

Sensors (May 2021)

Learning Macromanagement in Starcraft by Deep Reinforcement Learning

Wenzhen Huang,
Qiyue Yin,
Junge Zhang,
Kaiqi Huang

Affiliations

Wenzhen Huang: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Qiyue Yin: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Junge Zhang: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Kaiqi Huang: School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China

DOI: https://doi.org/10.3390/s21103332
Journal volume & issue: Vol. 21, no. 10
p. 3332

Abstract

Read online

StarCraft is a real-time strategy game that provides a complex environment for AI research. Macromanagement, i.e., selecting appropriate units to build depending on the current state, is one of the most important problems in this game. To reduce the requirements for expert knowledge and enhance the coordination of the systematic bot, we select reinforcement learning (RL) to tackle the problem of macromanagement. We propose a novel deep RL method, Mean Asynchronous Advantage Actor-Critic (MA3C), which computes the approximate expected policy gradient instead of the gradient of sampled action to reduce the variance of the gradient, and encode the history queue with recurrent neural network to tackle the problem of imperfect information. The experimental results show that MA3C achieves a very high rate of winning, approximately 90%, against the weaker opponents and it improves the win rate about 30% against the stronger opponents. We also propose a novel method to visualize and interpret the policy learned by MA3C. Combined with the visualized results and the snapshots of games, we find that the learned macromanagement not only adapts to the game rules and the policy of the opponent bot, but also cooperates well with the other modules of MA3C-Bot.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords