Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse

LI Hailiang; WANG Li

doi:10.16355/j.tyut.1007-9432.20230300

Taiyuan Ligong Daxue xuebao (Jul 2024)

Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse

LI Hailiang,
WANG Li

Affiliations

LI Hailiang: College of Data Science, Taiyuan University of Technology, Jinzhong 030600, China
WANG Li: College of Data Science, Taiyuan University of Technology, Jinzhong 030600, China

DOI: https://doi.org/10.16355/j.tyut.1007-9432.20230300
Journal volume & issue: Vol. 55, no. 4
pp. 712 – 719

Abstract

Read online

Purposes The algoritihm of phasic policy gradient with sample reuse (SR-PPG) is proposed to address the problems of non-reuse of samples and low sample utilization in policybased deep reinforcement learning algorithms. Methods In the proposed algorithm, offline data are introduced on the basis of the phasic policy gradient (PPG), thus reducing the time cost of training and enabling the model to converge quickly. In this work, SR-PPG combines the stability advantages of theoretically supported on-policy algorithms with the sample efficiency of offpolicy algorithms to develop policy improvement guarantees applicable to off-policy settings and to link these bounds to the clipping mechanism used by PPG. Findings A series of theoretical and experimental demonstrations show that this algorithm provides better performance by effectively balancing the competing goals of stability and sample efficiency.

Published in Taiyuan Ligong Daxue xuebao

ISSN: 1007-9432 (Print)
Publisher: Editorial Office of Journal of Taiyuan University of Technology
Country of publisher: China
LCC subjects: Technology: Chemical technology: Chemical engineering; Technology: Electrical engineering. Electronics. Nuclear engineering: Materials of engineering and construction. Mechanics of materials
Website: https://tyutjournal.tyut.edu.cn/english.html

About the journal

Abstract

Keywords