Study on UAV obstacle avoidance algorithm based on deep recurrent double Q network

WEI Yao; LIU Zhicheng; CAI Bin; CHEN Jiaxin; YANG Yao; ZHANG Kai

doi:10.1051/jnwpu/20224050970

Xibei Gongye Daxue Xuebao (Oct 2022)

Study on UAV obstacle avoidance algorithm based on deep recurrent double Q network

WEI Yao,
LIU Zhicheng,
CAI Bin,
CHEN Jiaxin,
YANG Yao,
ZHANG Kai

Affiliations

WEI Yao: School of Astronautics, Northwestern Polytechnical University
LIU Zhicheng: The Third Military Representative Office of Beijing Military Representative Office of Air Force Equipment Department in Tianjin
CAI Bin: Shanghai Aerospace Control Technology Institute
CHEN Jiaxin: Shanghai Aerospace Control Technology Institute
YANG Yao: Unmanned System Research Institute, Northwestern Polytechnical University
ZHANG Kai: Unmanned System Research Institute, Northwestern Polytechnical University

DOI: https://doi.org/10.1051/jnwpu/20224050970
Journal volume & issue: Vol. 40, no. 5
pp. 970 – 979

Abstract

Read online

The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.

Published in Xibei Gongye Daxue Xuebao

ISSN: 1000-2758 (Print); 2609-7125 (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Technology: Motor vehicles. Aeronautics. Astronautics
Website: https://www.jnwpu.org/

About the journal

Abstract

Keywords