IEEE Access (Jan 2024)
Double Critics and Double Actors Deep Deterministic Policy Gradient for Mobile Robot Navigation Using Adaptive Parameter Space Noise and Parallel Experience Replay
Abstract
In recent years, mobile robots have been increasingly used in a variety of industries, such as manufacturing, logistics, health care, and security services. Effective autonomous navigation preventing collisions for mobile robots has become crucial for efficiently performing tasks in these diverse applications. Deep reinforcement learning (DRL) is promising for path planning and autonomous navigation, but Deep Q-Network is limited to discrete actions. To extend to continuous action space, the deep deterministic policy gradient (DDPG) algorithm was developed. However, it suffers from insufficient exploration and overestimation. To solve these challenges, this paper proposes the adaptive parameter space noise with parallel experience replay, double critics and double actors DDPG (AP-D4PG) algorithm, an enhanced variant of DDPG. It incorporates adaptive parameter space noise in actor networks for better exploration, double critics and double actors to mitigate overestimation and improve the estimate accuracy of target Q value, and parallel experience replay to make a better balance between exploration and exploitation, thereby significantly improving the performance of autonomous navigation for mobile robots. Furthermore, a novel and powerful target-oriented reward function is employed to boost learning efficiency. Additionally, to further bolster the stability and robustness, useful improvements including gradient clipping mechanism and Xavier initialization are employed. Subsequently, the efficacy of the proposed model is assessed through numerical experiments with the robot operating system (ROS) and Gazebo. The experimental findings demonstrate that the AP-D4PG algorithm achieves faster convergence and enhanced robustness, with a 3.83% and 7.78% increase in navigation accuracy, and a 61.14% and 48.93% higher average score in static and dynamic scenarios, respectively, compared to traditional DDPG. These improvements and the robustness of the AP-D4PG algorithm in both static and dynamic environments indicate that refining network architectures, exploration strategies, and reward mechanisms can enhance the performance of DRL algorithms. This opens up new possibilities for improving the efficiency and effectiveness of other DRL methods.
Keywords