Hierarchical policy with deep-reinforcement learning for nonprehensile multiobject rearrangement

Fan Bai; Fei Meng; Jianbang Liu; Jiankun Wang; Max Q.-H. Meng

Biomimetic Intelligence and Robotics (Sep 2022)

Hierarchical policy with deep-reinforcement learning for nonprehensile multiobject rearrangement

Fan Bai,
Fei Meng,
Jianbang Liu,
Jiankun Wang,
Max Q.-H. Meng

Affiliations

Fan Bai: Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin N.T., Hong Kong SAR, China
Fei Meng: Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin N.T., Hong Kong SAR, China
Jianbang Liu: Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin N.T., Hong Kong SAR, China
Jiankun Wang: Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China
Max Q.-H. Meng: Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin N.T., Hong Kong SAR, China; Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China; Shenzhen Research Institute of the Chinese University of Hong Kong, Shenzhen, China; Corresponding author.

Journal volume & issue: Vol. 2, no. 3
p. 100047

Abstract

Read online

Nonprehensile multiobject rearrangement is the robotic task of planning feasible paths and transferring multiple objects to their predefined target poses without grasping. It must consider how each object reaches the target and the order in which objects move, considerably increasing the complexity of the problem. Thus, we propose a hierarchical policy for nonprehensile multiobject rearrangement based on deep-reinforcement learning. We use imitation learning and reinforcement learning to train a rollout policy. In a high-level policy, the policy network directs the Monte Carlo tree search algorithm to efficiently seek the ideal rearrangement sequence for several items. In a low-level policy, the robot plans the paths according to the order of path primitives and manipulates the objects to approach the target poses one by one. Our experiments show that the proposed method has a higher success rate, fewer steps, and shorter path length than the state-of-the-art methods.

Published in Biomimetic Intelligence and Robotics

ISSN: 2097-0242 (Print); 2667-3797 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/biomimetic-intelligence-and-robotics

About the journal

Abstract

Keywords