IEEE Access (Jan 2022)
Avoiding the Hook: Influential Factors of Phishing Awareness Training on Click-Rates and a Data-Driven Approach to Predict Email Difficulty Perception
Abstract
Phishing attacks are still seen as a significant threat to cyber security, and large parts of the industry rely on anti-phishing simulations to minimize the risk imposed by such attacks. This study conducted a large-scale anti-phishing training with more than 31000 participants and 144 different simulated phishing attacks to develop a data-driven model to classify how users would perceive a phishing simulation. Furthermore, we analyze the results of our large-scale anti-phishing training and give novel insights into users’ click behavior. Analyzing our anti-phishing training data, we find out that 66% of users do not fall victim to credential-based phishing attacks even after being exposed to twelve weeks of phishing simulations. To further enhance the phishing awareness-training effectiveness, we developed a novel manifold learning-powered machine learning model that can predict how many people would fall for a phishing simulation using the several structural and state-of-the-art NLP features extracted from the emails. In this way, we present a systematic approach for the training implementers to estimate the average “convincing power” of the emails prior to rolling out. Moreover, we revealed the top-most vital factors in the classification. In addition, our model presents significant benefits over traditional rule-based approaches in classifying the difficulty of phishing simulations. Our results clearly show that anti-phishing training should focus on the training of individual users rather than on large user groups. Additionally, we present a promising generic machine learning model for predicting phishing susceptibility.
Keywords