Optimal Deception Asset Deployment in Cybersecurity: A Nash Q-Learning Approach in Multi-Agent Stochastic Games

Guanhua Kong; Fucai Chen; Xiaohan Yang; Guozhen Cheng; Shuai Zhang; Weizhen He

doi:10.3390/app14010357

Applied Sciences (Dec 2023)

Optimal Deception Asset Deployment in Cybersecurity: A Nash Q-Learning Approach in Multi-Agent Stochastic Games

Guanhua Kong,
Fucai Chen,
Xiaohan Yang,
Guozhen Cheng,
Shuai Zhang,
Weizhen He

Affiliations

Guanhua Kong: Institute of Information Technology, PLA Information Engineering University, Zhengzhou 450002, China
Fucai Chen: Institute of Information Technology, PLA Information Engineering University, Zhengzhou 450002, China
Xiaohan Yang: Institute of Information Technology, PLA Information Engineering University, Zhengzhou 450002, China
Guozhen Cheng: Institute of Information Technology, PLA Information Engineering University, Zhengzhou 450002, China
Shuai Zhang: Institute of Information Technology, PLA Information Engineering University, Zhengzhou 450002, China
Weizhen He: Institute of Information Technology, PLA Information Engineering University, Zhengzhou 450002, China

DOI: https://doi.org/10.3390/app14010357
Journal volume & issue: Vol. 14, no. 1
p. 357

Abstract

Read online

In the face of an increasingly intricate network structure and a multitude of security threats, cyber deception defenders often employ deception assets to safeguard critical real assets. However, when it comes to the intranet lateral movement attackers in the cyber kill chain, the deployment of deception assets confronts the challenges of lack of dynamics, inability to make real-time decisions, and not considering the dynamic change of an attacker’s strategy. To address these issues, this study introduces a novel maze pathfinding model tailored to the lateral movement context, in which we try to find out the attacker’s location to deploy deception assets accurately for interception. The attack–defense process is modeled as a multi-agent stochastic game, by comparing it with random action policy and Minimax-Q algorithm, we choose Nash Q-learning to solve the deception asset’s deployment strategy to achieve the optimal solution effect. Extensive simulation tests reveal that our proposed model exhibits good convergence properties. Moreover, the average defense success rate surpasses 70%, attesting to the model’s efficacy.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords