Safe Reinforcement Learning for Arm Manipulation with Constrained Markov Decision Process

Patrick Adjei; Norman Tasfi; Santiago Gomez-Rosero; Miriam A. M. Capretz

doi:10.3390/robotics13040063

Robotics (Apr 2024)

Safe Reinforcement Learning for Arm Manipulation with Constrained Markov Decision Process

Patrick Adjei,
Norman Tasfi,
Santiago Gomez-Rosero,
Miriam A. M. Capretz

Affiliations

Patrick Adjei: Electrical and Computer Engineering, Western University, London, ON N6A 3K7, Canada
Norman Tasfi: Electrical and Computer Engineering, Western University, London, ON N6A 3K7, Canada
Santiago Gomez-Rosero: Electrical and Computer Engineering, Western University, London, ON N6A 3K7, Canada
Miriam A. M. Capretz: Electrical and Computer Engineering, Western University, London, ON N6A 3K7, Canada

DOI: https://doi.org/10.3390/robotics13040063
Journal volume & issue: Vol. 13, no. 4
p. 63

Abstract

Read online

In the world of human–robot coexistence, ensuring safe interactions is crucial. Traditional logic-based methods often lack the intuition required for robots, particularly in complex environments where these methods fail to account for all possible scenarios. Reinforcement learning has shown promise in robotics due to its superior adaptability over traditional logic. However, the exploratory nature of reinforcement learning can jeopardize safety. This paper addresses the challenges in planning trajectories for robotic arm manipulators in dynamic environments. In addition, this paper highlights the pitfalls of multiple reward compositions that are susceptible to reward hacking. A novel method with a simplified reward and constraint formulation is proposed. This enables the robot arm to avoid a nonstationary obstacle that never resets, enhancing operational safety. The proposed approach combines scalarized expected returns with a constrained Markov decision process through a Lagrange multiplier, resulting in better performance. The scalarization component uses the indicator cost function value, directly sampled from the replay buffer, as an additional scaling factor. This method is particularly effective in dynamic environments where conditions change continually, as opposed to approaches relying solely on the expected cost scaled by a Lagrange multiplier.

Published in Robotics

ISSN: 2218-6581 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Mechanical engineering and machinery
Website: http://www.mdpi.com/journal/robotics

About the journal

Abstract

Keywords