Applying Quantitative Model Checking to Analyze Safety in Reinforcement Learning

Ryeonggu Kwon; Gihwon Kwon; Sohee Park; Jiyoung Chang; Suhee Jo

doi:10.1109/ACCESS.2024.3358408

IEEE Access (Jan 2024)

Applying Quantitative Model Checking to Analyze Safety in Reinforcement Learning

Ryeonggu Kwon,
Gihwon Kwon,
Sohee Park,
Jiyoung Chang,
Suhee Jo

Affiliations

Ryeonggu Kwon: ORCiD; Department of Computer Science, Kyonggi University, Yeongtong-gu, Gyeonggi-do, Suwon-si, Republic of Korea
Gihwon Kwon: ORCiD; Department of Computer Science, Kyonggi University, Yeongtong-gu, Gyeonggi-do, Suwon-si, Republic of Korea
Sohee Park: ORCiD; Department of Software Safety and Cyber Security, Kyonggi University, Yeongtong-gu, Gyeonggi-do, Suwon-si, Republic of Korea
Jiyoung Chang: ORCiD; Department of Software Safety and Cyber Security, Kyonggi University, Yeongtong-gu, Gyeonggi-do, Suwon-si, Republic of Korea
Suhee Jo: ORCiD; Department of Software Safety and Cyber Security, Kyonggi University, Yeongtong-gu, Gyeonggi-do, Suwon-si, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3358408
Journal volume & issue: Vol. 12
pp. 18957 – 18971

Abstract

Read online

Reinforcement learning (RL) is rapidly used in safety-centric applications. However, many studies focus on generating optimal policy that achieves maximum rewards. While maximum rewards are beneficial, safety constraints and non-functional requirements must also be considered in safety-centric applications to avoid dangerous situations. For example, in the case of food delivery robots in restaurants, RL should be used not only to find optimal policy that response to all customer requests through maximum rewards but also to consider safety constraints such as collision avoidance and non-functional requirements such as battery saving. In this paper, we investigated the fulfillment of safety constraints and non-functional requirements of learning models generated through RL with quantitative model checking. We experimented with various time steps and learning rates required for RL, targeting restaurant delivery robots. The functional requirement of these robots is to process all customer order requests, and the non-functional requirements are the number of steps and battery consumption to complete the task. Safety constraints include the amount of collision and the probability of collision. Through these experiments, we made three important findings. First, learning models that obtain maximum rewards may have a low degree of achievement of non-functional requirements and safety constraints. Second, as safety constraints are met, the degree of achievement of non-functional requirements may be low. Third, even if the maximum reward is not obtained, sacrificing non-functional requirements can maximize the achievement of safety constraints. These results show that learning models generated through RL can trade off rewards to achieve safety constraints. In conclusion, our work can contribute to selecting suitable hyperparameters and optimal learning models during RL.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords