Applied Sciences (Jul 2023)

Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games

  • Yueqi Hou,
  • Xiaolong Liang,
  • Jiaqiang Zhang,
  • Qisong Yang,
  • Aiwu Yang,
  • Ning Wang

DOI
https://doi.org/10.3390/app13148283
Journal volume & issue
Vol. 13, no. 14
p. 8283

Abstract

Read online

Invalid action masking is a practical technique in deep reinforcement learning to prevent agents from taking invalid actions. Existing approaches rely on action masking during policy training and utilization. This study focuses on developing reinforcement learning algorithms that incorporate action masking during training but can be used without action masking during policy execution. The study begins by conducting a theoretical analysis to elucidate the distinction between naive policy gradient and invalid action policy gradient. Based on this analysis, we demonstrate that the naive policy gradient is a valid gradient and is equivalent to the proposed composite objective algorithm, which optimizes both the masked policy and the original policy in parallel. Moreover, we propose an off-policy algorithm for invalid action masking that employs the masked policy for sampling while optimizing the original policy. To compare the effectiveness of these algorithms, experiments are conducted using a simplified real-time strategy (RTS) game simulator called Gym-μRTS. Based on empirical findings, we recommend utilizing the off-policy algorithm for addressing most tasks while employing the composite objective algorithm for handling more complex tasks.

Keywords