GLIDE: Multi-Agent Deep Reinforcement Learning for Coordinated UAV Control in Dynamic Military Environments

Divija Swetha Gadiraju; Prasenjit Karmakar; Vijay K. Shah; Vaneet Aggarwal

doi:10.3390/info15080477

Information (Aug 2024)

GLIDE: Multi-Agent Deep Reinforcement Learning for Coordinated UAV Control in Dynamic Military Environments

Divija Swetha Gadiraju,
Prasenjit Karmakar,
Vijay K. Shah,
Vaneet Aggarwal

Affiliations

Divija Swetha Gadiraju: School of Interdisciplinary Informatics, University of Nebraska, Lincoln, NE 68588, USA
Prasenjit Karmakar: Department of Computer Science and Engineering, IIT Kharagpur, Kharagpur 721302, India
Vijay K. Shah: Department of Cybersecurity Engineering, George Mason University, Fairfax, VA 22030, USA
Vaneet Aggarwal: School of Industrial Engineering, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA

DOI: https://doi.org/10.3390/info15080477
Journal volume & issue: Vol. 15, no. 8
p. 477

Abstract

Read online

Unmanned aerial vehicles (UAVs) are widely used for missions in dynamic environments. Deep Reinforcement Learning (DRL) can find effective strategies for multiple agents that need to cooperate to complete the task. In this article, the challenge of controlling the movement of a fleet of UAVs is addressed by Multi-Agent Deep Reinforcement Learning (MARL). The collaborative movement of the UAV fleet can be controlled centrally and also in a decentralized fashion, which is studied in this work. We consider a dynamic military environment with a fleet of UAVs, whose task is to destroy enemy targets while avoiding obstacles like mines. The UAVs inherently come with a limited battery capacity directing our research to focus on the minimum task completion time. We propose a continuous-time-based Proximal Policy Optimization (PPO) algorithm for multi-aGent Learning In Dynamic Environments (GLIDE). In GLIDE, the UAVs coordinate among themselves and communicate with the central base to choose the best possible action. The action control in GLIDE can be controlled in a centralized and decentralized way, and two algorithms called Centralized-GLIDE (C-GLIDE), and Decentralized-GLIDE (D-GLIDE) are proposed on this basis. We developed a simulator called UAV SIM, in which the mines are placed at randomly generated 2D locations unknown to the UAVs at the beginning of each episode. The performance of both the proposed schemes is evaluated through extensive simulations. Both C-GLIDE and D-GLIDE converge and have comparable performance in target destruction rate for the same number of targets and mines. We observe that D-GLIDE is up to 68% faster in task completion time compared to C-GLIDE and could keep more UAVs alive at the end of the task.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords