Upper Confidence Bound Exploration with Fast Convergence

AO Tian-yu, LIU Quan

doi:10.11896/jsjkx.201100194

Jisuanji kexue (Jan 2022)

Upper Confidence Bound Exploration with Fast Convergence

AO Tian-yu, LIU Quan

Affiliations

AO Tian-yu, LIU Quan: 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China<br/>2 Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006,China<br/>3 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China<br/>4 Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000,China

DOI: https://doi.org/10.11896/jsjkx.201100194
Journal volume & issue: Vol. 49, no. 1
pp. 298 – 305

Abstract

Read online

Deep reinforcement learning method has achieved excellent results in large state space control tasks.Exploration has always been a research hotspot in this field.There are some problems in the existing exploration algorithms,such as blind exploration,and slow learning.To solve these problems,an upper confidence bound exploration with fast convergence (FAST-UCB) method is proposed.This method uses UCB method to explore the environment and improve the exploration efficiency.In order to alleviate the overestimation of Q value and balance the relationship between exploration and utilization,Q value clipped technique is added.Then,in order to balance the deviation and variance of the algorithm and make the agent learn quickly,the long short term memory unit is added to the network model,and an improved mixed monte carlo method is used to calculate the network error.Finally,FAST-UCB is applied to deep Q network,and compared with epsilon-greedy and UCB algorithms in control environment to verify its effectiveness.Besides,the proposed algorithm is compared with noise network exploration,bootstrapped exploration,asynchronous advantage actor critical algorithm and proximal policy optimization algorithm in Atari 2600 environment to verify its generalization.The experimental results show that FAST-UCB algorithm can achieve excellent results in these two environments.

exploration|upper confidence bound|long short term memory|mixed monte carlo|clipped q value

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords