IEEE Access (Jan 2024)
MP-TD3: Multi-Pool Prioritized Experience Replay-Based Asynchronous Twin Delayed Deep Deterministic Policy Gradient Algorithm
Abstract
The prioritized experience replay mechanisms have achieved remarkable success in accelerating the convergence of reinforcement learning algorithms. However, applying traditional prioritized experience replay mechanisms directly to asynchronous reinforcement learning leads to slow convergence, due to the difficulty for an agent to utilize excellent experiences obtained by other agents interacting with the environment. To address the above issue, we propose a Multi-pool Prioritized experience replay-based asynchronous Twin Delayed Deep Deterministic policy gradient algorithm (MP-TD3). Specifically, a multi-pool prioritized experience replay mechanism is proposed to strengthen the experience interactions among different agents to accelerate the network convergence. Then, a global-pool self-cleaning mechanism based on sample diversity and a global-pool self-cleaning mechanism based on TD-errors are designed to overcome the deficiency that the samples suffer from high redundancy and low information content in the global-pool, respectively. Finally, a multi-batch sampling mechanism is investigated to further reduce the training time. Extensive experiments validate that the proposed MP-TD3 significantly improve the convergence speed and performance compared with state-of-the-art methods.
Keywords