Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

Leen Kweider; Maissa Abou Kassem; Ubai Sandouk

doi:10.1109/ACCESS.2024.3486549

IEEE Access (Jan 2024)

Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning

Leen Kweider,
Maissa Abou Kassem,
Ubai Sandouk

Affiliations

Leen Kweider: ORCiD; Department of Artificial Intelligence, Faculty of Information Technology, Damascus University, Damascus, Syria
Maissa Abou Kassem: Department of Artificial Intelligence, Faculty of Information Technology, Damascus University, Damascus, Syria
Ubai Sandouk: ORCiD; Department of Software Engineering, Faculty of Information Technology, Damascus University, Damascus, Syria

DOI: https://doi.org/10.1109/ACCESS.2024.3486549
Journal volume & issue: Vol. 12
pp. 157140 – 157148

Abstract

Read online

The deployment of artificial intelligence (AI) in decision-making applications requires ensuring an appropriate level of safety and reliability, particularly in changing environments that contain a large number of unknown observations. To address this challenge, we propose a novel safe reinforcement learning (RL) approach that utilizes an anomalous state sequence to enhance RL safety. Our proposed solution Safe Reinforcement Learning with Anomalous State Sequences (AnoSeqs) consists of two stages. First, we train an agent in a non-safety-critical offline ‘source’ environment to collect safe state sequences. Next, we use these safe sequences to build an anomaly detection model that can detect potentially unsafe state sequences in a ‘target’ safety-critical environment where failures can have high costs. The estimated risk from the anomaly detection model is utilized to train a risk-averse RL policy in the target environment; this involves adjusting the reward function to penalize the agent for visiting anomalous states deemed unsafe by our anomaly model. In experiments on multiple safety-critical benchmarking environments including self-driving cars, our solution approach successfully learns safer policies and proves that sequential anomaly detection can provide an effective supervisory signal for training safety-aware RL agents.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords