IEEE Access (Jan 2024)

Discovering and Exploiting Skills in Hierarchical Reinforcement Learning

  • Zhigang Huang

DOI
https://doi.org/10.1109/ACCESS.2024.3491339
Journal volume & issue
Vol. 12
pp. 163042 – 163055

Abstract

Read online

Humans can perform infinite diverse skills. These skills typically represent abstract knowledge that is highly correlated with time series. To behave more like a human, we take a long-term planning perspective to discover and exploit skills (DES) in hierarchical reinforcement learning. We view the skill-learning process as an extension of primitive skills to advanced skills and ensure that they have sufficient exploration capability. DES discovers skills at the level of a trajectory sequence within a skill length, rather than at the level of individual states and actions. It assigns the skill inference loss from the recurrent neural network evenly to each time step, maximizing skill differentiation to cover fine-grained local areas. Furthermore, DES exploits skills in an adaptive way. It builds on a multi-step combination, and then makes switching decisions according to the relative advantages of the previous and the estimated skills, thus achieving long-term form skills. These advanced skills allow the agent to escape from local areas without sacrificing flexibility. A skill truncation is also set to prevent excessive exploration. Moreover, we verify the necessity of our discovery and exploitation methods from the perspective of skill inference and exploration capability, respectively. Our experimental analysis demonstrates the superiority of DES on continuous control tasks with sparse rewards and explains the benefits of our methods.

Keywords