网络与信息安全学报 (Dec 2023)

Autonomous security analysis and penetration testing model based on attack graph and deep Q-learning network

  • Cheng FAN, Guoqing HU, Taojie DING, Zhanhua ZHANG

DOI
https://doi.org/10.11959/j.issn.2096-109x.2023091
Journal volume & issue
Vol. 9, no. 6
pp. 166 – 175

Abstract

Read online

With the continuous development and widespread application of network technology, network security issues have become increasingly prominent.Penetration testing has emerged as an important method for assessing and enhancing network security.However, traditional manual penetration testing methods suffer from inefficiency,human error, and tester skills, leading to high uncertainty and poor evaluation results.To address these challenges, an autonomous security analysis and penetration testing framework called ASAPT was proposed, based on attack graphs and deep Q-learning networks (DQN).The ASAPT framework was consisted of two main components:training data construction and model training.In the training data construction phase, attack graphs were utilized to model the threats in the target network by representing vulnerabilities and possible attacker attack paths as nodes and edges.By integrating the common vulnerability scoring system (CVSS) vulnerability database, a “state-action”transition matrix was constructed, which depicted the attacker’s behavior and transition probabilities in different states.This matrix comprehensively captured the attacker’s capabilities and network security status.To reduce computational complexity, a depth-first search (DFS) algorithm was innovatively applied to simplify the transition matrix, identifying and preserving all attack paths that lead to the final goal for subsequent model training.In the model training phase, a deep reinforcement learning algorithm based on DQN was employed to determine the optimal attack path during penetration testing.The algorithm interacted continuously with the environment, updating the Q-value function to progressively optimize the selection of attack paths.Simulation results demonstrate that ASAPT achieves an accuracy of 84% in identifying the optimal path and exhibits fast convergence speed.Compared to traditional Q-learning, ASAPT demonstrates superior adaptability in dealing with large-scale network environments, which could provide guidance for practical penetration testing.

Keywords