Applied Sciences (Jul 2023)

Enhancing Phishing Email Detection through Ensemble Learning and Undersampling

  • Qinglin Qi,
  • Zhan Wang,
  • Yijia Xu,
  • Yong Fang,
  • Changhui Wang

DOI
https://doi.org/10.3390/app13158756
Journal volume & issue
Vol. 13, no. 15
p. 8756

Abstract

Read online

In real-world scenarios, the number of phishing and benign emails is usually imbalanced, leading to traditional machine learning or deep learning algorithms being biased towards benign emails and misclassifying phishing emails. Few studies take measures to address the imbalance between them, which significantly threatens people’s financial and information security. To mitigate the impact of imbalance on the model and enhance the detection performance of phishing emails, this paper proposes two new algorithms with undersampling: the Fisher–Markov-based phishing ensemble detection (FMPED) method and the Fisher–Markov–Markov-based phishing ensemble detection (FMMPED) method. The algorithms first remove benign emails in overlapping areas, then undersample the remaining benign emails, and finally, combine the retained benign emails with phishing emails into a new training set, using ensemble learning algorithms for training and classification. Experimental results have demonstrated that the proposed algorithms outperform other machine learning and deep learning algorithms, achieving an F1-score of 0.9945, an accuracy of 0.9945, an AUC of 0.9828, and a G-mean of 0.9827.

Keywords