Reinforcement Learning Algorithms for Autonomous Mission Accomplishment by Unmanned Aerial Vehicles: A Comparative View with DQN, SARSA and A2C

Gonzalo Aguilar Jiménez; Arturo de la Escalera Hueso; Maria J. Gómez-Silva

doi:10.3390/s23219013

Sensors (Nov 2023)

Reinforcement Learning Algorithms for Autonomous Mission Accomplishment by Unmanned Aerial Vehicles: A Comparative View with DQN, SARSA and A2C

Gonzalo Aguilar Jiménez,
Arturo de la Escalera Hueso,
Maria J. Gómez-Silva

Affiliations

Gonzalo Aguilar Jiménez: Dana SAC Spain, S.A., Dana Off-Highway, C/Abedul S/N, Pol. Ind. Los Huertecillos, 28350 Ciempozuelos, Madrid, Spain
Arturo de la Escalera Hueso: Intelligent Systems Lab, Universidad Carlos III de Madrid, Avda de la Universidad 30, 28911 Leganés, Madrid, Spain
Maria J. Gómez-Silva: Department of Computer Architecture and Automation, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, Plaza Ciencias 1, 28040 Madrid, Spain

DOI: https://doi.org/10.3390/s23219013
Journal volume & issue: Vol. 23, no. 21
p. 9013

Abstract

Read online

Unmanned aerial vehicles (UAV) can be controlled in diverse ways. One of the most common is through artificial intelligence (AI), which comprises different methods, such as reinforcement learning (RL). The article aims to provide a comparison of three RL algorithms—DQN as the benchmark, SARSA as a same-family algorithm, and A2C as a different-structure one—to address the problem of a UAV navigating from departure point A to endpoint B while avoiding obstacles and, simultaneously, using the least possible time and flying the shortest distance. Under fixed premises, this investigation provides the results of the performances obtained for this activity. A neighborhood environment was selected because it is likely one of the most common areas of use for commercial drones. Taking DQN as the benchmark and not having previous knowledge of the behavior of SARSA or A2C in the employed environment, the comparison outcomes showed that DQN was the only one achieving the target. At the same time, SARSA and A2C did not. However, a deeper analysis of the results led to the conclusion that a fine-tuning of A2C could overcome the performance of DQN under certain conditions, demonstrating a greater speed at maximum finding with a more straightforward structure.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords