Monte Carlo Tree Search for High-dimensional Continuous Control Space

LIU Tian-xing, LI Wei, XU Zheng, ZHANG Li-hua, QI Xiao-ya, GAN Zhong-xue

doi:10.11896/jsjkx.201000129

Jisuanji kexue (Oct 2021)

Monte Carlo Tree Search for High-dimensional Continuous Control Space

LIU Tian-xing, LI Wei, XU Zheng, ZHANG Li-hua, QI Xiao-ya, GAN Zhong-xue

Affiliations

LIU Tian-xing, LI Wei, XU Zheng, ZHANG Li-hua, QI Xiao-ya, GAN Zhong-xue: Institute of AI and Robotics,Fudan University,Shanghai 200433,China<br/>Jihua Laboratory,Foshan,Guangdong 528000,China

DOI: https://doi.org/10.11896/jsjkx.201000129
Journal volume & issue: Vol. 48, no. 10
pp. 30 – 36

Abstract

Read online

Monte Carlo tree search (MCTS) has gained great success in low discrete control tasks.However,there are many tasks in real life that require selecting action sequentially in continuous action space.Kernel regression UCT (KR-UCT) is a successful attempt in low-dimensional continuous action space by using a pre-defined kernel function to exploit the similarity of different continuous actions.However,KR-UCT gets a poor performance when it comes to high-dimensional continuous action space,because KR-UCT does not use the interacting information between agent and the environment.And when it interacts with the environment,KR-UCT needs to perform a lot of simulations at each step to find the best action.In order to solve this problem,this paper proposes a method named kernel regression UCT with policy-value network (KRPV).The proposed method can filter out more representative actions from action space to perform MCTS and generalize the information between different states to pruning MCTS.The proposed method has been evaluated by four continuous control tasks of the OpenAI gym.The experimental results show that KRPV outperforms KR-UCT in all tested continuous control tasks.Especially for the six-dimensional HalfCheetah-v2 task,the rewards gained by KRPV are six-timesof that of KR-UCT.

monte carlo tree search|high dimensional continuous action space|deep neural network|reinforcement learning|kernel regression uct

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords