Jisuanji kexue (Jan 2022)
Upper Confidence Bound Exploration with Fast Convergence
Abstract
Deep reinforcement learning method has achieved excellent results in large state space control tasks.Exploration has always been a research hotspot in this field.There are some problems in the existing exploration algorithms,such as blind exploration,and slow learning.To solve these problems,an upper confidence bound exploration with fast convergence (FAST-UCB) method is proposed.This method uses UCB method to explore the environment and improve the exploration efficiency.In order to alleviate the overestimation of Q value and balance the relationship between exploration and utilization,Q value clipped technique is added.Then,in order to balance the deviation and variance of the algorithm and make the agent learn quickly,the long short term memory unit is added to the network model,and an improved mixed monte carlo method is used to calculate the network error.Finally,FAST-UCB is applied to deep Q network,and compared with epsilon-greedy and UCB algorithms in control environment to verify its effectiveness.Besides,the proposed algorithm is compared with noise network exploration,bootstrapped exploration,asynchronous advantage actor critical algorithm and proximal policy optimization algorithm in Atari 2600 environment to verify its generalization.The experimental results show that FAST-UCB algorithm can achieve excellent results in these two environments.
Keywords