Finding the Optimal Security Policies for Autonomous Cyber Operations With Competitive Reinforcement Learning

Garrett Mcdonald; Li Li; Ranwa Al Mallah

doi:10.1109/ACCESS.2024.3446310

IEEE Access (Jan 2024)

Finding the Optimal Security Policies for Autonomous Cyber Operations With Competitive Reinforcement Learning

Garrett Mcdonald,
Li Li,
Ranwa Al Mallah

Affiliations

Garrett Mcdonald: Department of Electrical and Computer Engineering, Royal Military College of Canada, Kingston, ON, Canada
Li Li: Defence Research and Development Canada, Toronto, ON, Canada
Ranwa Al Mallah: ORCiD; Department of Electrical and Computer Engineering, Royal Military College of Canada, Kingston, ON, Canada

DOI: https://doi.org/10.1109/ACCESS.2024.3446310
Journal volume & issue: Vol. 12
pp. 120292 – 120305

Abstract

Read online

Reinforcement Learning (RL) has been responsible for some of the most impressive advances in the field of Artificial Intelligence (AI). Research in competitive RL has shown that multiple agents competing in an adversarial environment can learn simultaneously in order to discover their optimal decision-making policies. Competitive RL algorithms have been used to train performant AI for a variety of games and optimization problems. Cybersecurity is a domain where the emerging research in competitive RL is being considered for its real-world application. In order to develop Automated Cyber Operations (ACO) tools using RL, various open-source environments are available to simulate network security incidents. However, the existing research in these environments is typically one-sided: a Red or Blue agent is trained to optimize their decision-making against a static opponent. Competitive RL has not been attempted in these emerging environments. In this work, we trained agents using competitive RL to approximate their game theory optimal policies in a simulated ACO environment. We showed that near-optimal behavior was reached gradually through fictitious play demonstrating that these strategies can be used to approximate the optimal policies for agents involved in sophisticated sequential decision-making during a cyber attack.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords