Machines (Sep 2022)

Consistent Experience Replay in High-Dimensional Continuous Control with Decayed Hindsights

  • Xiaoyun Feng

DOI
https://doi.org/10.3390/machines10100856
Journal volume & issue
Vol. 10, no. 10
p. 856

Abstract

Read online

The manipulation of complex robotics, which is in general high-dimensional continuous control without an accurate dynamic model, summons studies and applications of reinforcement learning (RL) algorithms. Typically, RL learns with the objective of maximizing the accumulated rewards from interactions with the environment. In reality, external rewards are not trivial, which depend on either expert knowledge or domain priors. Recent advances on hindsight experience replay (HER) instead enable a robot to learn from the automatically generated sparse and binary rewards, indicating whether it reaches the desired goals or pseudo goals. However, HER inevitably introduces hindsight bias that skews the optimal control since the replays against the achieved pseudo goals may often differ from the exploration of the desired goals. To tackle the problem, we analyze the skewed objective and induce the decayed hindsight (DH), which enables consistent multi-goal experience replay via countering the bias between exploration and hindsight replay. We implement DH for goal-conditioned RL both in online and offline settings. Experiments on online robotic control tasks demonstrate that DH achieves the best average performance and is competitive with state-of-the-art replay strategies. Experiments on offline robotic control tasks show that DH substantially improves the ability to extract near-optimal policies from offline datasets.

Keywords