DESEM: Depthwise Separable Convolution-Based Multimodal Deep Learning for In-Game Action Anticipation

Changhyun Kim; Jinsoo Bae; Insung Baek; Jaeyoon Jeong; Young Jae Lee; Kiwoong Park; Sang Heun Shim; Seoung Bum Kim

doi:10.1109/ACCESS.2023.3271282

IEEE Access (Jan 2023)

DESEM: Depthwise Separable Convolution-Based Multimodal Deep Learning for In-Game Action Anticipation

Changhyun Kim,
Jinsoo Bae,
Insung Baek,
Jaeyoon Jeong,
Young Jae Lee,
Kiwoong Park,
Sang Heun Shim,
Seoung Bum Kim

Affiliations

Changhyun Kim: ORCiD; School of Industrial and Management Engineering, Korea University, Seoul, South Korea
Jinsoo Bae: ORCiD; School of Industrial and Management Engineering, Korea University, Seoul, South Korea
Insung Baek: ORCiD; School of Industrial and Management Engineering, Korea University, Seoul, South Korea
Jaeyoon Jeong: ORCiD; School of Industrial and Management Engineering, Korea University, Seoul, South Korea
Young Jae Lee: ORCiD; School of Industrial and Management Engineering, Korea University, Seoul, South Korea
Kiwoong Park: Agency for Defense Development (ADD), Seoul, South Korea
Sang Heun Shim: ORCiD; Agency for Defense Development (ADD), Seoul, South Korea
Seoung Bum Kim: ORCiD; School of Industrial and Management Engineering, Korea University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2023.3271282
Journal volume & issue: Vol. 11
pp. 46504 – 46512

Abstract

Read online

In real-time strategy (RTS) games, to defeat their opponents, players need to choose and implement the correct sequential actions. Because RTS games like StarCraft II are real-time, players have a very limited time to choose how to develop their strategy. In addition, players can only partially observe the parts of the map that they have explored. Therefore, unlike Chess or Go, players do not know what their opponents are doing. For these reasons, applying generally used artificial intelligence models to forecast sequential actions in RTS games is a challenge. To address this, we propose depthwise separable convolution-based multimodal deep learning (DESEM) for forecasting sequential actions in the game StarCraft II. DESEM performs multimodal learning using high-dimensional frames and action labels simultaneously as inputs. We use a depthwise separable convolution as the backbone network for extracting features from high-dimensional frames. In addition, we propose a weighted loss function to resolve class imbalances. We use 1,978 StarCraft II replays where the Terrans win in a Terran vs. Protoss game. The experimental results show that the proposed depthwise separable convolution is superior to the conventional convolution. Furthermore, we demonstrate that multimodal learning and the weighted loss function contribute significantly to improving forecasting performance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords