Robotics (Apr 2024)
Safe Reinforcement Learning for Arm Manipulation with Constrained Markov Decision Process
Abstract
In the world of human–robot coexistence, ensuring safe interactions is crucial. Traditional logic-based methods often lack the intuition required for robots, particularly in complex environments where these methods fail to account for all possible scenarios. Reinforcement learning has shown promise in robotics due to its superior adaptability over traditional logic. However, the exploratory nature of reinforcement learning can jeopardize safety. This paper addresses the challenges in planning trajectories for robotic arm manipulators in dynamic environments. In addition, this paper highlights the pitfalls of multiple reward compositions that are susceptible to reward hacking. A novel method with a simplified reward and constraint formulation is proposed. This enables the robot arm to avoid a nonstationary obstacle that never resets, enhancing operational safety. The proposed approach combines scalarized expected returns with a constrained Markov decision process through a Lagrange multiplier, resulting in better performance. The scalarization component uses the indicator cost function value, directly sampled from the replay buffer, as an additional scaling factor. This method is particularly effective in dynamic environments where conditions change continually, as opposed to approaches relying solely on the expected cost scaled by a Lagrange multiplier.
Keywords