Learning Adaptive Control of a UUV Using a Bio-Inspired Experience Replay Mechanism

Thomas Chaffre; Paulo E. Santos; Gilles Le Chenadec; Estelle Chauveau; Karl Sammut; Benoit Clement

doi:10.1109/ACCESS.2023.3329136

IEEE Access (Jan 2023)

Learning Adaptive Control of a UUV Using a Bio-Inspired Experience Replay Mechanism

Thomas Chaffre,
Paulo E. Santos,
Gilles Le Chenadec,
Estelle Chauveau,
Karl Sammut,
Benoit Clement

Affiliations

Thomas Chaffre: ORCiD; College of Science and Engineering, Flinders University, Adelaide, SA, Australia
Paulo E. Santos: ORCiD; College of Science and Engineering, Flinders University, Adelaide, SA, Australia
Gilles Le Chenadec: ORCiD; Lab-STICC UMR CNRS 6285, ENSTA Bretagne, Brest, France
Estelle Chauveau: Naval Group Research, Ollioules, France
Karl Sammut: ORCiD; College of Science and Engineering, Flinders University, Adelaide, SA, Australia
Benoit Clement: ORCiD; College of Science and Engineering, Flinders University, Adelaide, SA, Australia

DOI: https://doi.org/10.1109/ACCESS.2023.3329136
Journal volume & issue: Vol. 11
pp. 123505 – 123518

Abstract

Read online

Deep Reinforcement Learning (DRL) methods are increasingly being applied in Unmanned Underwater Vehicles (UUV) providing adaptive control responses to environmental disturbances. However, in physical platforms, these methods are hindered by their inherent data inefficiency and performance degradation when subjected to unforeseen process variations. This is particularly notorious in UUV manoeuvring tasks, where process observability is limited due to the complex dynamics of the environment in which these vehicles operate. To overcome these limitations, this paper proposes a novel Biologically-Inspired Experience Replay method (BIER), which considers two types of memory buffers: one that uses incomplete (but recent) trajectories of state-action pairs, and another that emphasises positive rewards. The BIER method’s ability to generalise was assessed by training neural network controllers for tasks such as inverted pendulum stabilisation, hopping, walking, and simulating halfcheetah running from the Gym-based Mujoco continuous control benchmark. BIER was then used with the Soft Actor-Critic (SAC) method on UUV manoeuvring tasks to stabilise the vehicle at a given velocity and pose under unknown environment dynamics. The proposed method was evaluated through simulated scenarios in a ROS-based UUV Simulator, progressively increasing in complexity. These scenarios varied in terms of target velocity values and the intensity of current disturbances. The results showed that BIER outperformed standard Experience Replay (ER) methods, achieving optimal performance twice as fast as the latter in the assumed UUV domain.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords