Jisuanji kexue (Jan 2023)

Sparse Reward Exploration Method Based on Trajectory Perception

  • ZHANG Qiyang, CHEN Xiliang, ZHANG Qiao

DOI
https://doi.org/10.11896/jsjkx.220700010
Journal volume & issue
Vol. 50, no. 1
pp. 262 – 269

Abstract

Read online

When dealing with sparse reward problems,existing deep RL algorithms often lead to hard exploration,they often only rely on the pre-designed environment reward,so it is difficult to achieve good results.In this situation,it is necessary to design rewards more carefully,make more accurate judgments and feedback on the exploration status of agents.The asynchronous advantage actor-critic(A3C) algorithm improves the training efficiency through parallel training,and improves the training speed of the original algorithm.However,for the environment with sparse rewards,it cannot well solve the problem of difficult exploration.To solve the problem of poor exploration effect of A3C algorithm in sparse reward environment,A3C based on exploration trajectory perception(ETP-A3C) is proposed.The algorithm can perceive the exploration trajectory of the agent when it is difficult to explore in training,further judge and decide the exploration direction of the agent,and help the agent get out of the exploration dilemma as soon as possible.In order to verify the effectiveness of ETP-A3C algorithm,a comparative experiment is carried out with baseline algorithm in five different environments of Super Mario Brothers.The results show that this method has significantly improved the learning speed and model stability.

Keywords