A-TD3: An Adaptive Asynchronous Twin Delayed Deep Deterministic for Continuous Action Spaces

Jiaolv Wu; Q. M. Jonathan Wu; Shuyue Chen; Farhad Pourpanah; Detian Huang

doi:10.1109/ACCESS.2022.3226446

IEEE Access (Jan 2022)

A-TD3: An Adaptive Asynchronous Twin Delayed Deep Deterministic for Continuous Action Spaces

Jiaolv Wu,
Q. M. Jonathan Wu,
Shuyue Chen,
Farhad Pourpanah,
Detian Huang

Affiliations

Jiaolv Wu: College of Engineering Institute, Huaqiao University, Quanzhou, China
Q. M. Jonathan Wu: ORCiD; Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada
Shuyue Chen: College of Mathematics and Statistic, Shenzhen University, Shenzhen, China
Farhad Pourpanah: ORCiD; Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada
Detian Huang: ORCiD; College of Engineering Institute, Huaqiao University, Quanzhou, China

DOI: https://doi.org/10.1109/ACCESS.2022.3226446
Journal volume & issue: Vol. 10
pp. 128077 – 128089

Abstract

Read online

Twin delayed deep deterministic (TD3) policy gradient is an effective algorithm for continuous action spaces. However, it cannot efficiently explore the spatial space and suffers from slow convergence, which is mainly due to the serial mode strategy in learning policies. On the other hand, asynchronous reinforcement learning algorithms, e.g., asynchronous advantageous actor-critic (A3C), are effective in exploring the environment, but they ignore the gradient information of different local agents, resulting in limiting their performance. To solve the above problems, in this study, we propose an asynchronous twin delayed deep deterministic, denoted as A-TD3, algorithm with an adaptive update strategy for continuous action spaces. Specifically, a parallel mechanism is used to improve the convergence speed, along with two adaptive weight functions based on Off-policy learning to dynamically adjust the weights of the local agents. The experimental results show that the proposed A-TD3 algorithm is able to produce comparable results in terms of training execution time and convergence speed as compared with that of conventional TD3 and other state-of-the-art algorithms.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords