An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks

Dengyu Liao; Zhen Zhang; Tingting Song; Mingyang Liu

doi:10.1109/ACCESS.2023.3340867

IEEE Access (Jan 2023)

An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks

Dengyu Liao,
Zhen Zhang,
Tingting Song,
Mingyang Liu

Affiliations

Dengyu Liao: Shandong Key Laboratory of Industrial Control Technology, School of Automation, Qingdao University, Qingdao, China
Zhen Zhang: ORCiD; Shandong Key Laboratory of Industrial Control Technology, School of Automation, Qingdao University, Qingdao, China
Tingting Song: Qingdao Metro Group Company Ltd., Operating Branch, Qingdao, China
Mingyang Liu: Shandong Key Laboratory of Industrial Control Technology, School of Automation, Qingdao University, Qingdao, China

DOI: https://doi.org/10.1109/ACCESS.2023.3340867
Journal volume & issue: Vol. 11
pp. 139284 – 139294

Abstract

Read online

Multi-agent reinforcement learning (MARL) for cooperative tasks has been extensively researched over the past decade. The prevalent framework for MARL algorithms is centralized training and decentralized execution. Q-learning is often employed as a centralized learner. However, it requires finding the maximum value by comparing the Q-value of each joint action a’ in the next state s’ to update the Q-value of the last visited state-action pair (s,a). When the joint action space is extensive, the maximization operation involving comparisons becomes time-consuming and becomes the dominant computational burden of the algorithm. To tackle this issue, we propose an algorithm to reduce the number of comparisons by saving the joint actions with the top 2 Q-values (T2Q). Updating the top 2 Q-values involves seven cases, and the T2Q algorithm can avoid traversing the Q-table to update the Q-value in five of these seven cases, thus alleviating the computational burden. Theoretical analysis demonstrates that the upper bound of the expected ratio of comparisons between T2Q and Q-learning decreases as the number of agents increases. Simulation results from two-stage stochastic games are consistent with the theoretical analysis. Furthermore, the effectiveness of the T2Q algorithm is validated through the distributed sensor network task and the target transportation task. The T2Q algorithm successfully completes both tasks with a 100% success rate and minimal computational overhead.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords