A Supervised Reinforcement Learning Algorithm for Controlling Drone Hovering

Jiying Wu; Zhong Yang; Haoze Zhuo; Changliang Xu; Chi Zhang; Naifeng He; Luwei Liao; Zhiyong Wang

doi:10.3390/drones8030069

Drones (Feb 2024)

A Supervised Reinforcement Learning Algorithm for Controlling Drone Hovering

Jiying Wu,
Zhong Yang,
Haoze Zhuo,
Changliang Xu,
Chi Zhang,
Naifeng He,
Luwei Liao,
Zhiyong Wang

Affiliations

Jiying Wu: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Zhong Yang: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Haoze Zhuo: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Changliang Xu: College of Electronic Engineering, Nanjing XiaoZhuang University, Nanjing 211171, China
Chi Zhang: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Naifeng He: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Luwei Liao: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Zhiyong Wang: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

DOI: https://doi.org/10.3390/drones8030069
Journal volume & issue: Vol. 8, no. 3
p. 69

Abstract

Read online

The application of drones carrying different devices for aerial hovering operations is becoming increasingly widespread, but currently there is very little research relying on reinforcement learning methods for hovering control, and it has not been implemented on physical machines. Drone’s behavior space regarding hover control is continuous and large-scale, making it difficult for basic algorithms and value-based reinforcement learning (RL) algorithms to have good results. In response to this issue, this article applies a watcher-actor-critic (WAC) algorithm to the drone’s hover control, which can quickly lock the exploration direction and achieve high robustness of the drone’s hover control while improving learning efficiency and reducing learning costs. This article first utilizes the actor-critic algorithm based on behavioral value Q (QAC) and the deep deterministic policy gradient algorithm (DDPG) for drone hover control learning. Subsequently, an actor-critic algorithm with an added watcher is proposed, in which the watcher uses a PID controller with parameters provided by a neural network as the dynamic monitor, transforming the learning process into supervised learning. Finally, this article uses a classic reinforcement learning environment library, Gym, and a current mainstream reinforcement learning framework, PARL, for simulation, and deploys the algorithm to a practical environment. A multi-sensor fusion strategy-based autonomous localization method for unmanned aerial vehicles is used for practical exercises. The simulation and experimental results show that the training episodes of WAC are reduced by 20% compared to the DDPG and 55% compared to the QAC, and the proposed algorithm has a higher learning efficiency, faster convergence speed, and smoother hovering effect compared to the QAC and DDPG.

Published in Drones

ISSN: 2504-446X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Motor vehicles. Aeronautics. Astronautics
Website: http://www.mdpi.com/journal/drones

About the journal

Abstract

Keywords