IEEE Access (Jan 2020)
An Intelligent Deployment Policy for Deception Resources Based on Reinforcement Learning
Abstract
Traditional deception-based cyber defenses (DCD) often adopt the static deployment policy that places the deception resources in some fixed positions in the target network. Unfortunately, the effectiveness of these deception resources has been greatly restricted by the static deployment policy, which also causes the deployed deception resources to be easily identified and bypassed by attackers. Moreover, the existing studies on dynamic deployment policy, which make many strict assumptions and constraints, are too idealistic to be practical. To overcome this limitation, an intelligent deployment policy used to dynamically adjust the locations of these deception resources according to the network security state is developed. Starting with formulating the problem of deception resources deployment, we then model the attacker-defender scenario and the attacker's strategy. Next, the preliminary screening method that can derive the effective deployment locations of deception resources based on threat penetration graph (TPG) is proposed. Afterward, we construct the model for finding the optimal policy to deploy the deception resources using reinforcement learning and design the Q-Learning training algorithm with model-free. Finally, we use the real-world network environment for our experiments and conduct in-depth comparisons with state-of-the-art methods. Our evaluations on a large number of attacks show that our method has a high defense success probability of nearly 80%, which is more efficient than existing schemes.
Keywords