An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Rousslan Fernand Julien Dossa; Shengyi Huang; Santiago Ontanon; Takashi Matsubara

doi:10.1109/ACCESS.2021.3106662

IEEE Access (Jan 2021)

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Rousslan Fernand Julien Dossa,
Shengyi Huang,
Santiago Ontanon,
Takashi Matsubara

Affiliations

Rousslan Fernand Julien Dossa: ORCiD; Graduate School of System Informatics, Kobe University, Hyogo, Japan
Shengyi Huang: College of Computing & Informatics, Drexel University, Philadelphia, PA, USA
Santiago Ontanon: College of Computing & Informatics, Drexel University, Philadelphia, PA, USA
Takashi Matsubara: ORCiD; Graduate School of Engineering Science, Osaka University, Osaka, Japan

DOI: https://doi.org/10.1109/ACCESS.2021.3106662
Journal volume & issue: Vol. 9
pp. 117981 – 117992

Abstract

Read online

Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as “early stopping” implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch ( $K$ ), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) mitigate such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on $K$ .

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords