A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Lieping Zhang; Liu Tang; Shenglan Zhang; Zhengzhong Wang; Xianhao Shen; Zuqiong Zhang

doi:10.3390/sym13061057

Symmetry (Jun 2021)

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Lieping Zhang,
Liu Tang,
Shenglan Zhang,
Zhengzhong Wang,
Xianhao Shen,
Zuqiong Zhang

Affiliations

Lieping Zhang: College of Mechanical and Control Engineering, Guilin University of Technology, Guilin 541006, China
Liu Tang: College of Mechanical and Control Engineering, Guilin University of Technology, Guilin 541006, China
Shenglan Zhang: College of Mechanical and Control Engineering, Guilin University of Technology, Guilin 541006, China
Zhengzhong Wang: College of Mechanical and Control Engineering, Guilin University of Technology, Guilin 541006, China
Xianhao Shen: College of Information Science and Engineering, Guilin University of Technology, Guilin 541006, China
Zuqiong Zhang: College of Information Science and Engineering, Guilin University of Technology, Guilin 541006, China

DOI: https://doi.org/10.3390/sym13061057
Journal volume & issue: Vol. 13, no. 6
p. 1057

Abstract

Read online

Directing at various problems of the traditional Q-Learning algorithm, such as heavy repetition and disequilibrium of explorations, the reinforcement-exploration strategy was used to replace the decayed ε-greedy strategy in the traditional Q-Learning algorithm, and thus a novel self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed. First, the concept of behavior utility trace was introduced in the proposed algorithm, and the probability for each action to be chosen was adjusted according to the behavior utility trace, so as to improve the efficiency of exploration. Second, the attenuation process of exploration factor ε was designed into two phases, where the first phase centered on the exploration and the second one transited the focus from the exploration into utilization, and the exploration rate was dynamically adjusted according to the success rate. Finally, by establishing a list of state access times, the exploration factor of the current state is adaptively adjusted according to the number of times the state is accessed. The symmetric grid map environment was established via OpenAI Gym platform to carry out the symmetrical simulation experiments on the Q-Learning algorithm, self-adaptive Q-Learning (SA-Q) algorithm and SARE-Q algorithm. The experimental results show that the proposed algorithm has obvious advantages over the first two algorithms in the average number of turning times, average inside success rate, and number of times with the shortest planned route.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords