Double Critics and Double Actors Deep Deterministic Policy Gradient for Mobile Robot Navigation Using Adaptive Parameter Space Noise and Parallel Experience Replay

Wenjie Hu; Ye Zhou; Hann Woei Ho; Chun Zhang

doi:10.1109/ACCESS.2024.3499378

IEEE Access (Jan 2024)

Double Critics and Double Actors Deep Deterministic Policy Gradient for Mobile Robot Navigation Using Adaptive Parameter Space Noise and Parallel Experience Replay

Wenjie Hu,
Ye Zhou,
Hann Woei Ho,
Chun Zhang

Affiliations

Wenjie Hu: ORCiD; School of Aerospace Engineering, Engineering Campus, Universiti Sains Malaysia, Pulau Pinang, Nibong Tebal, Malaysia
Ye Zhou: ORCiD; School of Aerospace Engineering, Engineering Campus, Universiti Sains Malaysia, Pulau Pinang, Nibong Tebal, Malaysia
Hann Woei Ho: ORCiD; School of Aerospace Engineering, Engineering Campus, Universiti Sains Malaysia, Pulau Pinang, Nibong Tebal, Malaysia
Chun Zhang: School of Aeronautics, Northwestern Polytechnical University, Shaanxi, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2024.3499378
Journal volume & issue: Vol. 12
pp. 173192 – 173208

Abstract

Read online

In recent years, mobile robots have been increasingly used in a variety of industries, such as manufacturing, logistics, health care, and security services. Effective autonomous navigation preventing collisions for mobile robots has become crucial for efficiently performing tasks in these diverse applications. Deep reinforcement learning (DRL) is promising for path planning and autonomous navigation, but Deep Q-Network is limited to discrete actions. To extend to continuous action space, the deep deterministic policy gradient (DDPG) algorithm was developed. However, it suffers from insufficient exploration and overestimation. To solve these challenges, this paper proposes the adaptive parameter space noise with parallel experience replay, double critics and double actors DDPG (AP-D4PG) algorithm, an enhanced variant of DDPG. It incorporates adaptive parameter space noise in actor networks for better exploration, double critics and double actors to mitigate overestimation and improve the estimate accuracy of target Q value, and parallel experience replay to make a better balance between exploration and exploitation, thereby significantly improving the performance of autonomous navigation for mobile robots. Furthermore, a novel and powerful target-oriented reward function is employed to boost learning efficiency. Additionally, to further bolster the stability and robustness, useful improvements including gradient clipping mechanism and Xavier initialization are employed. Subsequently, the efficacy of the proposed model is assessed through numerical experiments with the robot operating system (ROS) and Gazebo. The experimental findings demonstrate that the AP-D4PG algorithm achieves faster convergence and enhanced robustness, with a 3.83% and 7.78% increase in navigation accuracy, and a 61.14% and 48.93% higher average score in static and dynamic scenarios, respectively, compared to traditional DDPG. These improvements and the robustness of the AP-D4PG algorithm in both static and dynamic environments indicate that refining network architectures, exploration strategies, and reward mechanisms can enhance the performance of DRL algorithms. This opens up new possibilities for improving the efficiency and effectiveness of other DRL methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords