Learning near‐optimal broadcasting intervals in decentralized multi‐agent systems using online least‐square policy iteration

Ivana Palunko; Domagoj Tolić; Vicko Prkačin

doi:10.1049/cth2.12102

IET Control Theory & Applications (May 2021)

Learning near‐optimal broadcasting intervals in decentralized multi‐agent systems using online least‐square policy iteration

Ivana Palunko,
Domagoj Tolić,
Vicko Prkačin

Affiliations

Ivana Palunko: LARIAT‐Laboratory for Intelligent Autonomous Systems University of Dubrovnik Dubrovnik Croatia
Domagoj Tolić: LARIAT‐Laboratory for Intelligent Autonomous Systems RIT Croatia Dubrovnik Croatia
Vicko Prkačin: LARIAT‐Laboratory for Intelligent Autonomous Systems University of Dubrovnik Dubrovnik Croatia

DOI: https://doi.org/10.1049/cth2.12102
Journal volume & issue: Vol. 15, no. 8
pp. 1054 – 1067

Abstract

Read online

Abstract Here, agents learn how often to exchange information with neighbours in cooperative multi‐agent systems (MASs) such that their linear quadratic regulator (LQR)‐like performance indices are minimized. The investigated LQR‐like cost functions capture trade‐offs between the energy consumption of each agent and MAS local control performance in the presence of exogenous disturbances, delayed and noisy data. Agent energy consumption is critical for prolonging the MAS mission and is composed of both control (e.g. acceleration, velocity) and communication efforts. Taking provably stabilizing upper bounds on broadcasting intervals as optimization constraints, an online off‐policy model‐free learning algorithm based on least square policy iteration (LSPI) to minimize the cost function of each agent is employed. Consequently, the obtained broadcasting intervals adapt to the most recent information (e.g. delayed and noisy agents' inputs and/or outputs) received from neighbours whilst provably stabilize the MAS. Chebyshev polynomials are utilized as the approximator in the LSPI whereas Kalman filtering handles sampled, corrupted, and delayed data. Subsequently, convergence and near‐optimality of our LSPI scheme are inspected. The proposed methodology is verified experimentally using an inexpensive motion capture system and nano quadrotors.

Published in IET Control Theory & Applications

ISSN: 1751-8644 (Print); 1751-8652 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General)
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17518652

About the journal

Abstract

Keywords