Jisuanji kexue (Oct 2021)

Monte Carlo Tree Search for High-dimensional Continuous Control Space

  • LIU Tian-xing, LI Wei, XU Zheng, ZHANG Li-hua, QI Xiao-ya, GAN Zhong-xue

DOI
https://doi.org/10.11896/jsjkx.201000129
Journal volume & issue
Vol. 48, no. 10
pp. 30 – 36

Abstract

Read online

Monte Carlo tree search (MCTS) has gained great success in low discrete control tasks.However,there are many tasks in real life that require selecting action sequentially in continuous action space.Kernel regression UCT (KR-UCT) is a successful attempt in low-dimensional continuous action space by using a pre-defined kernel function to exploit the similarity of different continuous actions.However,KR-UCT gets a poor performance when it comes to high-dimensional continuous action space,because KR-UCT does not use the interacting information between agent and the environment.And when it interacts with the environment,KR-UCT needs to perform a lot of simulations at each step to find the best action.In order to solve this problem,this paper proposes a method named kernel regression UCT with policy-value network (KRPV).The proposed method can filter out more representative actions from action space to perform MCTS and generalize the information between different states to pruning MCTS.The proposed method has been evaluated by four continuous control tasks of the OpenAI gym.The experimental results show that KRPV outperforms KR-UCT in all tested continuous control tasks.Especially for the six-dimensional HalfCheetah-v2 task,the rewards gained by KRPV are six-timesof that of KR-UCT.

Keywords