Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios

Moiz Ahmad; Muhammad Babar Ramzan; Muhammad Omair; Muhammad Salman Habib

doi:10.3390/math12131954

Mathematics (Jun 2024)

Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios

Moiz Ahmad,
Muhammad Babar Ramzan,
Muhammad Omair,
Muhammad Salman Habib

Affiliations

Moiz Ahmad: Department of Industrial and Manufacturing Engineering, University of Engineering and Technology, Lahore 54700, Pakistan
Muhammad Babar Ramzan: School of Engineering and Technology, National Textile University, Faisalabad 37610, Pakistan
Muhammad Omair: Department of Materials and Production, Aalborg University, 9220 Aalborg Øst, Denmark
Muhammad Salman Habib: Institute of Knowledge Services, Center for Creative Convergence Education, Hanyang University ERICA Campus, Ansan-si 15588, Gyeonggi-do, Republic of Korea

DOI: https://doi.org/10.3390/math12131954
Journal volume & issue: Vol. 12, no. 13
p. 1954

Abstract

Read online

This paper considers a risk-averse Markov decision process (MDP) with non-risk constraints as a dynamic optimization framework to ensure robustness against unfavorable outcomes in high-stakes sequential decision-making situations such as disaster response. In this regard, strong duality is proved while making no assumptions on the problem’s convexity. This is necessary for some real-world issues, e.g., in the case of deprivation costs in the context of disaster relief, where convexity cannot be ensured. Our theoretical results imply that the problem can be exactly solved in a dual domain where it becomes convex. Based on our duality results, an augmented Lagrangian-based constraint handling mechanism is also developed for risk-averse reinforcement learning algorithms. The mechanism is proved to be theoretically convergent. Finally, we have also empirically established the convergence of the mechanism using a multi-stage disaster response relief allocation problem while using a fixed negative reward scheme as a benchmark.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords