Multi‐agent reinforcement learning based transmission scheme for IRS‐assisted multi‐UAV systems

Yumo Mei; Chen Liu; Yunchao Song; Ge Wang; Huibin Liang

doi:10.1049/cmu2.12674

IET Communications (Oct 2023)

Multi‐agent reinforcement learning based transmission scheme for IRS‐assisted multi‐UAV systems

Yumo Mei,
Chen Liu,
Yunchao Song,
Ge Wang,
Huibin Liang

Affiliations

Yumo Mei: College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology) Nanjing University of Posts and Telecommunications Nanjing Jiangsu China
Chen Liu: College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology) Nanjing University of Posts and Telecommunications Nanjing Jiangsu China
Yunchao Song: College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology) Nanjing University of Posts and Telecommunications Nanjing Jiangsu China
Ge Wang: College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology) Nanjing University of Posts and Telecommunications Nanjing Jiangsu China
Huibin Liang: College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology) Nanjing University of Posts and Telecommunications Nanjing Jiangsu China

DOI: https://doi.org/10.1049/cmu2.12674
Journal volume & issue: Vol. 17, no. 17
pp. 2019 – 2029

Abstract

Read online

Abstract In this paper, a transmission scheme based on multi‐agent reinforcement learning for intelligent reflecting surface (IRS)‐assisted multiple unmanned aerial vehicles (UAVs) systems is proposed. The proposed scheme is based on reinforcement learning and alternating optimization algorithm, which can effectively improve communication quality and ensure fairness. The scheme is divided into two parts. In the first part, the multi‐UAV cooperation problem is modeled as a markov decision process. The objective of each UAV is to maximize the minimum user channel gain. To achieve stable strategies for all agents, the Multi‐agent Deep Deterministic Policy Gradient (MADDPG) algorithm is applied to train UAVs trajectories to reach the Nash equilibrium. The MADDPG algorithm is centralized trained at the base station and executed in a distributed manner by each UAV, ensuring efficient and effective coordination among agents. In the second part, an alternating optimization algorithm is formulated to optimize active and passive beamforming. Considering the non‐convexity of the fairness objective, by using auxiliary variables and semi‐definite relaxation method, the problem of maximizing the minimum user achievable rate is transformed into a feasibility problem. Simulation results show that the proposed scheme can effectively train UAVs trajectories and improve the communication performance of all users fairly.

Published in IET Communications

ISSN: 1751-8628 (Print); 1751-8636 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17518636

About the journal

Abstract

Keywords