IEEE Access (Jan 2020)

A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning With Modified Upper Confidence Bound Tree Search

  • Jianan Yang,
  • Xiaolei Hou,
  • Yu Hen Hu,
  • Yong Liu,
  • Quan Pan

DOI
https://doi.org/10.1109/ACCESS.2020.3001311
Journal volume & issue
Vol. 8
pp. 108461 – 108473

Abstract

Read online

The increasing number of space debris is a critical impact on space environment. Active multi-debris removal (ADR) mission planning technique with maximal reward objective is getting more attention. As the goal of Reinforcement Learning (RL) is in accordance with maximal-reward optimization model of ADR, the planning will be more efficient with the advanced RL scheme and RL algorithm. In this paper, first, an RL formulation is presented for the ADR mission planning problem. All the basic components of maximal-reward optimization model are recast in RL scheme. Second, a modified Upper Confidence bound Tree (UCT) search algorithm for the ADR planning task is developed, which both leverages the neural-network-assisted selection and expansion procedures to facilitate exploration and incorporates roll-out simulation in the backup procedure to achieve robust value estimation. This algorithm fits the RL scheme of ADR mission planning and better balances the exploration and exploitation. Experimental comparison using three subsets of Iridium 33 debris cloud data reveals a better performance of this modified UCT over previously reported results and close UCT variants.

Keywords