Solving Action Semantic Conflict in Physically Heterogeneous Multi-Agent Reinforcement Learning with Generalized Action-Prediction Optimization

Xiaoyang Yu; Youfang Lin; Shuo Wang; Sheng Han

doi:10.3390/app15052580

Applied Sciences (Feb 2025)

Solving Action Semantic Conflict in Physically Heterogeneous Multi-Agent Reinforcement Learning with Generalized Action-Prediction Optimization

Xiaoyang Yu,
Youfang Lin,
Shuo Wang,
Sheng Han

Affiliations

Xiaoyang Yu: School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China
Youfang Lin: School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China
Shuo Wang: School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China
Sheng Han: School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China

DOI: https://doi.org/10.3390/app15052580
Journal volume & issue: Vol. 15, no. 5
p. 2580

Abstract

Read online

Traditional multi-agent reinforcement learning (MARL) algorithms typically implement global parameter sharing across various types of heterogeneous agents without meticulously differentiating between different action semantics. This approach results in the action semantic conflict problem, which decreases the generalization ability of policy networks across heterogeneous types of agents and decreases the cooperation among agents in intricate scenarios. Conversely, completely independent agent parameters significantly escalate computational costs and training complexity. To address these challenges, we introduce an adaptive MARL algorithm named Generalized Action-Prediction Optimization (GAPO). First, we introduce the Generalized Action Space (GAS), which represents the union of all agent actions with distinct semantics. All agents first compute their unified representation in the GAS, and then generate their heterogeneous action policies with different available action masks. Second, in order to further improve cooperation between heterogeneous groups, we propose a Cross-Group Prediction (CGP) loss, which adaptively predicts the action policies of other groups by leveraging trajectory information. We integrate the GAPO into both value-based and policy-based MARL algorithms, giving rise to two practical algorithms: G-QMIX and G-MAPPO. Experimental results obtained within the SMAC, MPE, MAMuJoCo, and RPE environments demonstrate the superiority of G-QMIX and G-MAPPO over several state-of-the-art MARL methods, validating the effectiveness of our proposed adaptive generalized MARL approach.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords