AI (Aug 2022)

Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments

  • Dugan Um,
  • Prasad Nethala,
  • Hocheol Shin

DOI
https://doi.org/10.3390/ai3030037
Journal volume & issue
Vol. 3, no. 3
pp. 645 – 658

Abstract

Read online

In this paper, a hierarchical reinforcement learning (HRL) architecture, namely a “Hierarchical Deep Deterministic Policy Gradient (HDDPG)” has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.

Keywords