Sensors (Oct 2024)

Multi-Agent DRL for Air-to-Ground Communication Planning in UAV-Enabled IoT Networks

  • Khalid Ibrahim Qureshi,
  • Bingxian Lu,
  • Cheng Lu,
  • Muhammad Ali Lodhi,
  • Lei Wang

DOI
https://doi.org/10.3390/s24206535
Journal volume & issue
Vol. 24, no. 20
p. 6535

Abstract

Read online

In this paper, we present a novel method to enhance the sum-rate effectiveness in full-duplex unmanned aerial vehicle (UAV)-assisted communication networks. Existing approaches often couple uplink and downlink associations, resulting in suboptimal performance, particularly in dynamic environments where user demands and network conditions are unpredictable. To overcome these limitations, we propose a decoupling of uplink and downlink associations for ground-based users (GBUs), significantly improving network efficiency. We formulate a comprehensive optimization problem that integrates UAV trajectory design and user association, aiming to maximize the overall sum-rate efficiency of the network. Due to the problem’s non-convexity, we reformulate it as a Partially Observable Markov Decision Process (POMDP), enabling UAVs to make real-time decisions based on local observations without requiring complete global information. Our framework employs multi-agent deep reinforcement learning (MADRL), specifically the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, which balances centralized training with distributed execution. This allows UAVs to efficiently learn optimal user associations and trajectory controls while dynamically adapting to local conditions. The proposed solution is particularly suited for critical applications such as disaster response and search and rescue missions, highlighting the practical significance of utilizing UAVs for rapid network deployment in emergencies. By addressing the limitations of existing centralized and distributed solutions, our hybrid model combines the benefits of centralized training with the adaptability of distributed inference, ensuring optimal UAV operations in real-time scenarios.

Keywords