Enhancing Phishing Email Detection through Ensemble Learning and Undersampling

Qinglin Qi; Zhan Wang; Yijia Xu; Yong Fang; Changhui Wang

doi:10.3390/app13158756

Applied Sciences (Jul 2023)

Enhancing Phishing Email Detection through Ensemble Learning and Undersampling

Qinglin Qi,
Zhan Wang,
Yijia Xu,
Yong Fang,
Changhui Wang

Affiliations

Qinglin Qi: College of Cybersecurity, Sichuan University, Chengdu 610065, China
Zhan Wang: College of Cybersecurity, Sichuan University, Chengdu 610065, China
Yijia Xu: College of Cybersecurity, Sichuan University, Chengdu 610065, China
Yong Fang: College of Cybersecurity, Sichuan University, Chengdu 610065, China
Changhui Wang: Department of Fundamental Courses, Chengdu Textile College, Chengdu 611731, China

DOI: https://doi.org/10.3390/app13158756
Journal volume & issue: Vol. 13, no. 15
p. 8756

Abstract

Read online

In real-world scenarios, the number of phishing and benign emails is usually imbalanced, leading to traditional machine learning or deep learning algorithms being biased towards benign emails and misclassifying phishing emails. Few studies take measures to address the imbalance between them, which significantly threatens people’s financial and information security. To mitigate the impact of imbalance on the model and enhance the detection performance of phishing emails, this paper proposes two new algorithms with undersampling: the Fisher–Markov-based phishing ensemble detection (FMPED) method and the Fisher–Markov–Markov-based phishing ensemble detection (FMMPED) method. The algorithms first remove benign emails in overlapping areas, then undersample the remaining benign emails, and finally, combine the retained benign emails with phishing emails into a new training set, using ensemble learning algorithms for training and classification. Experimental results have demonstrated that the proposed algorithms outperform other machine learning and deep learning algorithms, achieving an F1-score of 0.9945, an accuracy of 0.9945, an AUC of 0.9828, and a G-mean of 0.9827.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords