Optimal consensus control for multi‐agent systems: Multi‐step policy gradient adaptive dynamic programming method

Lianghao Ji; Kai Jian; Cuijuan Zhang; Shasha Yang; Xing Guo; Huaqing Li

doi:10.1049/cth2.12473

IET Control Theory & Applications (Jul 2023)

Optimal consensus control for multi‐agent systems: Multi‐step policy gradient adaptive dynamic programming method

Lianghao Ji,
Kai Jian,
Cuijuan Zhang,
Shasha Yang,
Xing Guo,
Huaqing Li

Affiliations

Lianghao Ji: Chongqing Key Laboratory of Image Cognition Chongqing University of Posts and Telecommunications Chongqing China
Kai Jian: Chongqing Key Laboratory of Image Cognition Chongqing University of Posts and Telecommunications Chongqing China
Cuijuan Zhang: Chongqing Key Laboratory of Image Cognition Chongqing University of Posts and Telecommunications Chongqing China
Shasha Yang: Chongqing Key Laboratory of Image Cognition Chongqing University of Posts and Telecommunications Chongqing China
Xing Guo: Chongqing Key Laboratory of Image Cognition Chongqing University of Posts and Telecommunications Chongqing China
Huaqing Li: College of Electronic and Information Engineering Southwest University Chongqing China

DOI: https://doi.org/10.1049/cth2.12473
Journal volume & issue: Vol. 17, no. 11
pp. 1443 – 1457

Abstract

Read online

Abstract This paper presents a novel adaptive dynamic programming (ADP) method to solve the optimal consensus problem for a class of discrete‐time multi‐agent systems with completely unknown dynamics. Different from the classical RL‐based optimal control algorithms based on one‐step temporal difference method, a multi‐step‐based (also call n‐step) policy gradient ADP (MS‐PGADP) algorithm, which have been proved to be more efficient owing to its faster propagation of the reward, is proposed to obtain the iterative control policies. Moreover, a novel Q‐function is defined, which estimates the performance of performing an action in the current state. Then, through the Lyapunov stability theorem and functional analysis, the proof of optimality of the performance index function is given and the stability of the error system is also proved. Furthermore, the actor‐critic neural networks are used to implement the proposed method. Inspired by deep Q network, the target network is also introduced to guarantee the stability of NNs in the process of training. Finally, two simulations are conducted to verify the effectiveness of the proposed algorithm.

Published in IET Control Theory & Applications

ISSN: 1751-8644 (Print); 1751-8652 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General)
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17518652

About the journal

Abstract

Keywords