Obstacle avoidance USV in multi-static obstacle environments based on a deep reinforcement learning approach

Dengyao Jiang; Mingzhe Yuan; Junfeng Xiong; Jinchao Xiao; Yong Duan

doi:10.1177/00202940231195937

Measurement + Control (Apr 2024)

Obstacle avoidance USV in multi-static obstacle environments based on a deep reinforcement learning approach

Dengyao Jiang,
Mingzhe Yuan,
Junfeng Xiong,
Jinchao Xiao,
Yong Duan

Affiliations

Dengyao Jiang: Guangzhou Institute of Industrial Intelligence, Guangzhou, China
Mingzhe Yuan: Academicians Experts Workstation, Guangzhou Institute of Industrial Intelligence, Guangzhou, China
Junfeng Xiong: Academicians Experts Workstation, Guangzhou Institute of Industrial Intelligence, Guangzhou, China
Jinchao Xiao: Academicians Experts Workstation, Guangzhou Institute of Industrial Intelligence, Guangzhou, China
Yong Duan: School of Information Science and Engineering, Shenyang University of Technology, Shenyang, China

DOI: https://doi.org/10.1177/00202940231195937
Journal volume & issue: Vol. 57

Abstract

Read online

Unmanned surface vehicles (USVs) are intelligent platforms for unmanned surface navigation based on artificial intelligence, motion control, environmental awareness, and other professional technologies. Obstacle avoidance is an important part of its autonomous navigation. Although the USV works in the water environment (e.g. monitoring and tracking, search and rescue scenarios), the dynamic and complex operating environment makes the traditional methods not suitable for solving the obstacle avoidance problem of the USV. In this paper, to address the issue of poor convergence of the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm of Deep Reinforcement Learning (DRL) in an unstructured environment and wave current interference, random walk policy is proposed to deposit the pre-exploration policy of the algorithm into the experience pool to accelerate the convergence of the algorithm and thus achieve USV obstacle avoidance, which can achieve collision-free navigation from any start point to a given end point in a dynamic and complex environment without offline trajectory and track point generation. We design a pre-exploration policy for the environment and a virtual simulation environment for training and testing the algorithm and give the reward function and training method. The simulation results show that our proposed algorithm is more manageable to converge than the original algorithm and can perform better in complex environments in terms of obstacle avoidance behavior, reflecting the algorithm’s feasibility and effectiveness.

Published in Measurement + Control

ISSN: 0020-2940 (Print); 2051-8730 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General); Technology: Technology (General)
Website: https://journals.sagepub.com/home/mac

About the journal