A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Zhen Zhang; Dongqing Wang; Dongbin Zhao; Qiaoni Han; Tingting Song

doi:10.1109/ACCESS.2018.2878853

IEEE Access (Jan 2018)

A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Zhen Zhang,
Dongqing Wang,
Dongbin Zhao,
Qiaoni Han,
Tingting Song

Affiliations

Zhen Zhang: ORCiD; School of Automation, Qingdao University, Qingdao, China
Dongqing Wang: School of Automation, Qingdao University, Qingdao, China
Dongbin Zhao: State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Qiaoni Han: School of Automation, Qingdao University, Qingdao, China
Tingting Song: School of Automation, Qingdao University, Qingdao, China

DOI: https://doi.org/10.1109/ACCESS.2018.2878853
Journal volume & issue: Vol. 6
pp. 70223 – 70235

Abstract

Read online

Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords