Target-Oriented Multi-Agent Coordination with Hierarchical Reinforcement Learning

Yuekang Yu; Zhongyi Zhai; Weikun Li; Jianyu Ma

doi:10.3390/app14167084

Applied Sciences (Aug 2024)

Target-Oriented Multi-Agent Coordination with Hierarchical Reinforcement Learning

Yuekang Yu,
Zhongyi Zhai,
Weikun Li,
Jianyu Ma

Affiliations

Yuekang Yu: School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
Zhongyi Zhai: School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
Weikun Li: School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
Jianyu Ma: School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China

DOI: https://doi.org/10.3390/app14167084
Journal volume & issue: Vol. 14, no. 16
p. 7084

Abstract

Read online

In target-oriented multi-agent tasks, agents collaboratively achieve goals defined by specific objects, or targets, in their environment. The key to success is the effective coordination between agents and these targets, especially in dynamic environments where targets may shift. Agents must adeptly adjust to these changes and re-evaluate their target interactions. Inefficient coordination can lead to resource waste, extended task times, and lower overall performance. Addressing this challenge, we introduce the regulatory hierarchical multi-agent coordination (RHMC), a hierarchical reinforcement learning approach. RHMC divides the coordination task into two levels: a high-level policy, assigning targets based on environmental state, and a low-level policy, executing basic actions guided by individual target assignments and observations. Stabilizing RHMC’s high-level policy is crucial for effective learning. This stability is achieved by reward regularization, reducing reliance on the dynamic low-level policy. Such regularization ensures the high-level policy remains focused on broad coordination, not overly dependent on specific agent actions. By minimizing low-level policy dependence, RHMC adapts more seamlessly to environmental changes, boosting learning efficiency. Testing demonstrates RHMC’s superiority over existing methods in global reward and learning efficiency, highlighting its effectiveness in multi-agent coordination.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords