Avoiding the Hook: Influential Factors of Phishing Awareness Training on Click-Rates and a Data-Driven Approach to Predict Email Difficulty Perception

Thomas Sutter; Ahmet Selman Bozkir; Benjamin Gehring; Peter Berlich

doi:10.1109/ACCESS.2022.3207272

IEEE Access (Jan 2022)

Avoiding the Hook: Influential Factors of Phishing Awareness Training on Click-Rates and a Data-Driven Approach to Predict Email Difficulty Perception

Thomas Sutter,
Ahmet Selman Bozkir,
Benjamin Gehring,
Peter Berlich

Affiliations

Thomas Sutter: ORCiD; Institute of Applied Information Technology, Zurich University of Applied Sciences, Winterthur, Switzerland
Ahmet Selman Bozkir: ORCiD; Institute of Applied Information Technology, Zurich University of Applied Sciences, Winterthur, Switzerland
Benjamin Gehring: ORCiD; Institute of Applied Information Technology, Zurich University of Applied Sciences, Winterthur, Switzerland
Peter Berlich: ORCiD; Institute of Applied Information Technology, Zurich University of Applied Sciences, Winterthur, Switzerland

DOI: https://doi.org/10.1109/ACCESS.2022.3207272
Journal volume & issue: Vol. 10
pp. 100540 – 100565

Abstract

Read online

Phishing attacks are still seen as a significant threat to cyber security, and large parts of the industry rely on anti-phishing simulations to minimize the risk imposed by such attacks. This study conducted a large-scale anti-phishing training with more than 31000 participants and 144 different simulated phishing attacks to develop a data-driven model to classify how users would perceive a phishing simulation. Furthermore, we analyze the results of our large-scale anti-phishing training and give novel insights into users’ click behavior. Analyzing our anti-phishing training data, we find out that 66% of users do not fall victim to credential-based phishing attacks even after being exposed to twelve weeks of phishing simulations. To further enhance the phishing awareness-training effectiveness, we developed a novel manifold learning-powered machine learning model that can predict how many people would fall for a phishing simulation using the several structural and state-of-the-art NLP features extracted from the emails. In this way, we present a systematic approach for the training implementers to estimate the average “convincing power” of the emails prior to rolling out. Moreover, we revealed the top-most vital factors in the classification. In addition, our model presents significant benefits over traditional rule-based approaches in classifying the difficulty of phishing simulations. Our results clearly show that anti-phishing training should focus on the training of individual users rather than on large user groups. Additionally, we present a promising generic machine learning model for predicting phishing susceptibility.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords