Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control

Sangwoo Jeon; Hoeun Lee; Vishnu Kumar Kaliappan; Tuan Anh Nguyen; Hyungeun Jo; Hyeonseo Cho; Dugki Min

doi:10.3390/en15197426

Energies (Oct 2022)

Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control

Sangwoo Jeon,
Hoeun Lee,
Vishnu Kumar Kaliappan,
Tuan Anh Nguyen,
Hyungeun Jo,
Hyeonseo Cho,
Dugki Min

Affiliations

Sangwoo Jeon: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea
Hoeun Lee: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea
Vishnu Kumar Kaliappan: Konkuk Aerospace Design-Airworthiness Research Institute, Konkuk University, Seoul 05029, Korea
Tuan Anh Nguyen: Konkuk Aerospace Design-Airworthiness Research Institute, Konkuk University, Seoul 05029, Korea
Hyungeun Jo: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea
Hyeonseo Cho: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea
Dugki Min: Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea

DOI: https://doi.org/10.3390/en15197426
Journal volume & issue: Vol. 15, no. 19
p. 7426

Abstract

Read online

The proliferation of unmanned aerial vehicles (UAVs) has spawned a variety of intelligent services, where efficient coordination plays a significant role in increasing the effectiveness of cooperative execution. However, due to the limited operational time and range of UAVs, achieving highly efficient coordinated actions is difficult, particularly in unknown dynamic environments. This paper proposes a multiagent deep reinforcement learning (MADRL)-based fusion-multiactor-attention-critic (F-MAAC) model for multiple UAVs’ energy-efficient cooperative navigation control. The proposed model is built on the multiactor-attention-critic (MAAC) model, which offers two significant advances. The first is the sensor fusion layer, which enables the actor network to utilize all required sensor information effectively. Next, a layer that computes the dissimilarity weights of different agents is added to compensate for the information lost through the attention layer of the MAAC model. We utilize the UAV LDS (logistic delivery service) environment created by the Unity engine to train the proposed model and verify its energy efficiency. The feature that measures the total distance traveled by the UAVs is incorporated with the UAV LDS environment to validate the energy efficiency. To demonstrate the performance of the proposed model, the F-MAAC model is compared with several conventional reinforcement learning models with two use cases. First, we compare the F-MAAC model to the DDPG, MADDPG, and MAAC models based on the mean episode rewards for 20k episodes of training. The two top-performing models (F-MAAC and MAAC) are then chosen and retrained for 150k episodes. Our study determines the total amount of deliveries done within the same period and the total amount done within the same distance to represent energy efficiency. According to our simulation results, the F-MAAC model outperforms the MAAC model, making 38% more deliveries in 3000 time steps and 30% more deliveries per 1000 m of distance traveled.

Published in Energies

ISSN: 1996-1073 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/energies

About the journal

Abstract

Keywords