Jisuanji kexue yu tansuo (May 2024)

Self-competitive Hindsight Experience Replay with Penalty Measures

  • WANG Zihao, QIAN Xuezhong, SONG Wei

DOI
https://doi.org/10.3778/j.issn.1673-9418.2303031
Journal volume & issue
Vol. 18, no. 5
pp. 1223 – 1231

Abstract

Read online

Self-competitive hindsight experience replay (SCHER) is an improved strategy proposed based on the hindsight experience replay (HER) algorithm. The HER algorithm generates virtual labeled data by replaying experiences to optimize the model in the face of sparse environmental rewards. However, the HER algorithm has two problems: firstly, it cannot handle the large amount of repetitive data generated due to sparse rewards, which contaminates the experience pool; secondly, virtual goals may randomly select intermediate states that are not helpful in completing the task, leading to learning bias. To address these issues, the SCHER algorithm proposes two improvement strategies: firstly, increase the adaptive reward signal to penalize meaningless actions made by agents and quickly avoid such operations; secondly, use self-competition strategy to generate two sets of data for the same task, analyze and compare them, and find the key steps that enable the agent to succeed in different environments, thereby improving the accuracy of generated virtual goals. Experimental results show that the SCHER algorithm can better utilize the experience replay technique, increasing the average task success rate by 5.7 percentage points, and has higher accuracy and generalization ability.

Keywords