Mutual information oriented deep skill chaining for multi‐agent reinforcement learning

Zaipeng Xie; Cheng Ji; Chentai Qiao; WenZhan Song; Zewen Li; Yufeng Zhang; Yujing Zhang

doi:10.1049/cit2.12322

CAAI Transactions on Intelligence Technology (Aug 2024)

Mutual information oriented deep skill chaining for multi‐agent reinforcement learning

Zaipeng Xie,
Cheng Ji,
Chentai Qiao,
WenZhan Song,
Zewen Li,
Yufeng Zhang,
Yujing Zhang

Affiliations

Zaipeng Xie: Key Laboratory of Water Big Data Technology of Ministry of Water Resources Hohai University Nanjing China
Cheng Ji: College of Computer and Information Hohai University Nanjing China
Chentai Qiao: College of Computer and Information Hohai University Nanjing China
WenZhan Song: Center for Cyber‐Physical Systems University of Georgia Athens Georgia USA
Zewen Li: Information Networking Institute Carnegie Mellon University Pittsburgh Pennsylvania USA
Yufeng Zhang: College of Computer and Information Hohai University Nanjing China
Yujing Zhang: Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia Pennsylvania USA

DOI: https://doi.org/10.1049/cit2.12322
Journal volume & issue: Vol. 9, no. 4
pp. 1014 – 1030

Abstract

Read online

Abstract Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents. However, in high‐dimensional continuous spaces, the non‐stationary environment can provide outdated experiences that hinder convergence, resulting in ineffective training performance for multi‐agent systems. To tackle this issue, a novel reinforcement learning scheme, Mutual Information Oriented Deep Skill Chaining (MioDSC), is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency. These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state. In addition, MioDSC can generate cooperative policies using the options framework, allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning. MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels. The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.

Published in CAAI Transactions on Intelligence Technology

ISSN: 2468-2322 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/24682322

About the journal

Abstract

Keywords