Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks

Bo Sun; Tao Ban; Chansu Han; Takeshi Takahashi; Katsunari Yoshioka; Jun'ichi Takeuchi; Abdolhossein Sarrafzadeh; Meikang Qiu; Daisuke Inoue

doi:10.1109/ACCESS.2021.3082000

IEEE Access (Jan 2021)

Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks

Bo Sun,
Tao Ban,
Chansu Han,
Takeshi Takahashi,
Katsunari Yoshioka,
Jun'ichi Takeuchi,
Abdolhossein Sarrafzadeh,
Meikang Qiu,
Daisuke Inoue

Affiliations

Bo Sun: ORCiD; Department of Information Systems, Saitama Institute of Technology, Fukaya, Japan
Tao Ban: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Chansu Han: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Takeshi Takahashi: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Katsunari Yoshioka: Graduate School of Environment and Information Sciences, Yokohama National University, Yokohama, Japan
Jun'ichi Takeuchi: ORCiD; Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
Abdolhossein Sarrafzadeh: Center of Excellence in Cybersecurity, North Carolina A&T State University, Greensboro, NC, USA
Meikang Qiu: Department of Computer Science and Information Systems, Texas A&M University–Commerce, Commerce, TX, USA
Daisuke Inoue: National Institute of Information and Communications Technology, Koganei, Japan

DOI: https://doi.org/10.1109/ACCESS.2021.3082000
Journal volume & issue: Vol. 9
pp. 87962 – 87971

Abstract

Read online

Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords