Multi-Objective Optimization in Air-to-Air Communication System Based on Multi-Agent Deep Reinforcement Learning

Shaofu Lin; Yingying Chen; Shuopeng Li

doi:10.3390/s23239541

Sensors (Nov 2023)

Multi-Objective Optimization in Air-to-Air Communication System Based on Multi-Agent Deep Reinforcement Learning

Shaofu Lin,
Yingying Chen,
Shuopeng Li

Affiliations

Shaofu Lin: Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Yingying Chen: Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Shuopeng Li: Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

DOI: https://doi.org/10.3390/s23239541
Journal volume & issue: Vol. 23, no. 23
p. 9541

Abstract

Read online

With the advantages of real-time data processing and flexible deployment, unmanned aerial vehicle (UAV)-assisted mobile edge computing systems are widely used in both civil and military fields. However, due to limited energy, it is usually difficult for UAVs to stay in the air for long periods and to perform computational tasks. In this paper, we propose a full-duplex air-to-air communication system (A2ACS) model combining mobile edge computing and wireless power transfer technologies, aiming to effectively reduce the computational latency and energy consumption of UAVs, while ensuring that the UAVs do not interrupt the mission or leave the work area due to insufficient energy. In this system, UAVs collect energy from external air-edge energy servers (AEESs) to power onboard batteries and offload computational tasks to AEESs to reduce latency. To optimize the system’s performance and balance the four objectives, including the system throughput, the number of low-power alarms of UAVs, the total energy received by UAVs and the energy consumption of AEESs, we develop a multi-objective optimization framework. Considering that AEESs require rapid decision-making in a dynamic environment, an algorithm based on multi-agent deep deterministic policy gradient (MADDPG) is proposed, to optimize the AEESs’ service location and to control the power of energy transfer. While training, the agents learn the optimal policy given the optimization weight conditions. Furthermore, we adopt the K-means algorithm to determine the association between AEESs and UAVs to ensure fairness. Simulated experiment results show that the proposed MODDPG (multi-objective DDPG) algorithm has better performance than the baseline algorithms, such as the genetic algorithm and other deep reinforcement learning algorithms.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords