Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

Chaohai Kang; Chuiting Rong; Weijian Ren; Fengcai Huo; Pengyun Liu

doi:10.1109/ACCESS.2021.3074535

IEEE Access (Jan 2021)

Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

Chaohai Kang,
Chuiting Rong,
Weijian Ren,
Fengcai Huo,
Pengyun Liu

Affiliations

Chaohai Kang: College of Electrical and Information Engineering, Northeast Petroleum University, Daqing, China
Chuiting Rong: ORCiD; College of Electrical and Information Engineering, Northeast Petroleum University, Daqing, China
Weijian Ren: ORCiD; College of Electrical and Information Engineering, Northeast Petroleum University, Daqing, China
Fengcai Huo: College of Electrical and Information Engineering, Northeast Petroleum University, Daqing, China
Pengyun Liu: College of Electrical and Information Engineering, Northeast Petroleum University, Daqing, China

DOI: https://doi.org/10.1109/ACCESS.2021.3074535
Journal volume & issue: Vol. 9
pp. 60296 – 60308

Abstract

Read online

The traditional deep deterministic policy gradient (DDPG) algorithm has the disadvantages of slow convergence velocity and ease of falling into the local optimum. From these two perspectives, a DDPG algorithm based on the double network prioritized experience replay mechanism (DNPER-DDPG) is proposed in this paper. Firstly, the value function is approximated by introducing the idea of two neural networks, and the minimum of the action value functions generated by the two networks is selected as the updated value of the actor policy network, which reduces the incidence of local optimal policy. Then, the Q values obtained by the two networks and the immediate reward obtained by the environment are used as the criteria for prioritization, and the importance of the samples in the experience replay mechanism is divided to improve the convergence speed of the algorithm. Finally, the improved method is demonstrated in the classic control environment of OpenAI Gym, and the results show that the proposed method has increased convergence speed and cumulative reward compared with the comparison algorithm.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords