A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning With Modified Upper Confidence Bound Tree Search

Jianan Yang; Xiaolei Hou; Yu Hen Hu; Yong Liu; Quan Pan

doi:10.1109/ACCESS.2020.3001311

IEEE Access (Jan 2020)

A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning With Modified Upper Confidence Bound Tree Search

Jianan Yang,
Xiaolei Hou,
Yu Hen Hu,
Yong Liu,
Quan Pan

Affiliations

Jianan Yang: ORCiD; School of Automation, Northwestern Polytechnical University, Xi’an, China
Xiaolei Hou: School of Automation, Northwestern Polytechnical University, Xi’an, China
Yu Hen Hu: Department of Electrical and Computer Engineering, University of Wisconsin–Madison, Madison, WI, USA
Yong Liu: School of Automation, Northwestern Polytechnical University, Xi’an, China
Quan Pan: School of Automation, Northwestern Polytechnical University, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2020.3001311
Journal volume & issue: Vol. 8
pp. 108461 – 108473

Abstract

Read online

The increasing number of space debris is a critical impact on space environment. Active multi-debris removal (ADR) mission planning technique with maximal reward objective is getting more attention. As the goal of Reinforcement Learning (RL) is in accordance with maximal-reward optimization model of ADR, the planning will be more efficient with the advanced RL scheme and RL algorithm. In this paper, first, an RL formulation is presented for the ADR mission planning problem. All the basic components of maximal-reward optimization model are recast in RL scheme. Second, a modified Upper Confidence bound Tree (UCT) search algorithm for the ADR planning task is developed, which both leverages the neural-network-assisted selection and expansion procedures to facilitate exploration and incorporates roll-out simulation in the backup procedure to achieve robust value estimation. This algorithm fits the RL scheme of ADR mission planning and better balances the exploration and exploitation. Experimental comparison using three subsets of Iridium 33 debris cloud data reveals a better performance of this modified UCT over previously reported results and close UCT variants.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords