IEEE Access (Jan 2022)
A-TD3: An Adaptive Asynchronous Twin Delayed Deep Deterministic for Continuous Action Spaces
Abstract
Twin delayed deep deterministic (TD3) policy gradient is an effective algorithm for continuous action spaces. However, it cannot efficiently explore the spatial space and suffers from slow convergence, which is mainly due to the serial mode strategy in learning policies. On the other hand, asynchronous reinforcement learning algorithms, e.g., asynchronous advantageous actor-critic (A3C), are effective in exploring the environment, but they ignore the gradient information of different local agents, resulting in limiting their performance. To solve the above problems, in this study, we propose an asynchronous twin delayed deep deterministic, denoted as A-TD3, algorithm with an adaptive update strategy for continuous action spaces. Specifically, a parallel mechanism is used to improve the convergence speed, along with two adaptive weight functions based on Off-policy learning to dynamically adjust the weights of the local agents. The experimental results show that the proposed A-TD3 algorithm is able to produce comparable results in terms of training execution time and convergence speed as compared with that of conventional TD3 and other state-of-the-art algorithms.
Keywords