Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles

Jiabao Yu; Jiawei Chen; Ying Chen; Zhiguo Zhou; Junwei Duan

doi:10.3390/jmse10122026

Journal of Marine Science and Engineering (Dec 2022)

Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles

Jiabao Yu,
Jiawei Chen,
Ying Chen,
Zhiguo Zhou,
Junwei Duan

Affiliations

Jiabao Yu: School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China
Jiawei Chen: School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China
Ying Chen: School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China
Zhiguo Zhou: School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100081, China
Junwei Duan: College of Information Science and Technology, Jinan University, Guangzhou 510632, China

DOI: https://doi.org/10.3390/jmse10122026
Journal volume & issue: Vol. 10, no. 12
p. 2026

Abstract

Read online

Although broad reinforcement learning (BRL) provides a more intelligent autonomous decision-making method for the collision avoidance problem of unmanned surface vehicles (USVs), the algorithm still has the problem of over-estimation and has difficulty converging quickly due to the sparse reward problem in a large area of sea. To overcome the dilemma, we propose a double broad reinforcement learning based on hindsight experience replay (DBRL-HER) for the collision avoidance system of USVs to improve the efficiency and accuracy of decision-making. The algorithm decouples the two steps of target action selection and target Q value calculation to form the double broad reinforcement learning method and then adopts hindsight experience replay to allow the agent to learn from the experience of failure in order to greatly improve the sample utilization efficiency. Through training in a grid environment, the collision avoidance success rate of the proposed algorithm was found to be 31.9 percentage points higher than that in the deep Q network (DQN) and 24.4 percentage points higher than that in BRL. A Unity 3D simulation platform with high fidelity was also designed to simulate the movement of USVs. An experiment on the platform fully verified the effectiveness of the proposed algorithm.

Published in Journal of Marine Science and Engineering

ISSN: 2077-1312 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Naval Science: Naval architecture. Shipbuilding. Marine engineering; Geography. Anthropology. Recreation: Oceanography
Website: http://www.mdpi.com/journal/jmse

About the journal

Abstract

Keywords