Jisuanji kexue (Oct 2021)

Deep Deterministic Policy Gradient with Episode Experience Replay

  • ZHANG Jian-hang, LIU Quan

DOI
https://doi.org/10.11896/jsjkx.200900208
Journal volume & issue
Vol. 48, no. 10
pp. 37 – 43

Abstract

Read online

The research on continuous control in reinforcement learning has been a hot topic in recent years.The deep deterministic policy gradient (DDPG) algorithm performs well in continuous control tasks.DDPG algorithm uses experience replay mechanism to train the network model,and in order to further improve the efficiency of experience replay mechanism in the DDPG algorithm,the cumulative reward is used as the transition classification basis,a deep deterministic policy gradient with episodic experience replay (EER-DDPG) algorithm is proposed.First of all,the transitions are stored in the unit of episode,and two replay buffersare introduced respectively to classify the transitions according to the cumulative reward.Then,the quality of policy can be improved in network model training period by random sampling of the episodes with large cumulative reward.In the continuous control tasks,this algorithm is verified by experiments,and compared with DDPG algorithm,trust region policy optimization (TRPO) algorithm and proximal policy optimization (PPO) algorithm.The experimental results show that EER-DDPG algorithm has better performance.

Keywords