Applied Sciences (Feb 2025)

Solving Action Semantic Conflict in Physically Heterogeneous Multi-Agent Reinforcement Learning with Generalized Action-Prediction Optimization

  • Xiaoyang Yu,
  • Youfang Lin,
  • Shuo Wang,
  • Sheng Han

DOI
https://doi.org/10.3390/app15052580
Journal volume & issue
Vol. 15, no. 5
p. 2580

Abstract

Read online

Traditional multi-agent reinforcement learning (MARL) algorithms typically implement global parameter sharing across various types of heterogeneous agents without meticulously differentiating between different action semantics. This approach results in the action semantic conflict problem, which decreases the generalization ability of policy networks across heterogeneous types of agents and decreases the cooperation among agents in intricate scenarios. Conversely, completely independent agent parameters significantly escalate computational costs and training complexity. To address these challenges, we introduce an adaptive MARL algorithm named Generalized Action-Prediction Optimization (GAPO). First, we introduce the Generalized Action Space (GAS), which represents the union of all agent actions with distinct semantics. All agents first compute their unified representation in the GAS, and then generate their heterogeneous action policies with different available action masks. Second, in order to further improve cooperation between heterogeneous groups, we propose a Cross-Group Prediction (CGP) loss, which adaptively predicts the action policies of other groups by leveraging trajectory information. We integrate the GAPO into both value-based and policy-based MARL algorithms, giving rise to two practical algorithms: G-QMIX and G-MAPPO. Experimental results obtained within the SMAC, MPE, MAMuJoCo, and RPE environments demonstrate the superiority of G-QMIX and G-MAPPO over several state-of-the-art MARL methods, validating the effectiveness of our proposed adaptive generalized MARL approach.

Keywords