IEEE Access (Jan 2024)

Finding the Optimal Security Policies for Autonomous Cyber Operations With Competitive Reinforcement Learning

  • Garrett Mcdonald,
  • Li Li,
  • Ranwa Al Mallah

DOI
https://doi.org/10.1109/ACCESS.2024.3446310
Journal volume & issue
Vol. 12
pp. 120292 – 120305

Abstract

Read online

Reinforcement Learning (RL) has been responsible for some of the most impressive advances in the field of Artificial Intelligence (AI). Research in competitive RL has shown that multiple agents competing in an adversarial environment can learn simultaneously in order to discover their optimal decision-making policies. Competitive RL algorithms have been used to train performant AI for a variety of games and optimization problems. Cybersecurity is a domain where the emerging research in competitive RL is being considered for its real-world application. In order to develop Automated Cyber Operations (ACO) tools using RL, various open-source environments are available to simulate network security incidents. However, the existing research in these environments is typically one-sided: a Red or Blue agent is trained to optimize their decision-making against a static opponent. Competitive RL has not been attempted in these emerging environments. In this work, we trained agents using competitive RL to approximate their game theory optimal policies in a simulated ACO environment. We showed that near-optimal behavior was reached gradually through fictitious play demonstrating that these strategies can be used to approximate the optimal policies for agents involved in sophisticated sequential decision-making during a cyber attack.

Keywords