Performance Evaluation of Phishing Classification Techniques on Various Data Sources and Schemes

Rahmad Abdillah; Zarina Shukur; Masnizah Mohd; T. S. Mohd Zamri Murah; Insu Oh; Kangbin Yim

doi:10.1109/ACCESS.2022.3225971

IEEE Access (Jan 2023)

Performance Evaluation of Phishing Classification Techniques on Various Data Sources and Schemes

Rahmad Abdillah,
Zarina Shukur,
Masnizah Mohd,
T. S. Mohd Zamri Murah,
Insu Oh,
Kangbin Yim

Affiliations

Rahmad Abdillah: ORCiD; Center for Cyber Security (CYBER), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
Zarina Shukur: Center for Cyber Security (CYBER), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
Masnizah Mohd: ORCiD; Center for Cyber Security (CYBER), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
T. S. Mohd Zamri Murah: Center for Cyber Security (CYBER), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
Insu Oh: Department of Information Security Engineering, Soonchunhyang University, Asan, South Korea
Kangbin Yim: ORCiD; Department of Software Convergence Engineering, Soonchunhyang University, Asan, South Korea

DOI: https://doi.org/10.1109/ACCESS.2022.3225971
Journal volume & issue: Vol. 11
pp. 38721 – 38738

Abstract

Read online

Phishing attacks have become a perilous threat in recent years, which has led to numerous studies to determine the classification technique that best detects these attacks. Several studies have made comparisons using only specific datasets and techniques without including the most crucial aspect, which is the performance evaluation of data changes. Hence, classification techniques cannot be generalized if they only use specific datasets and techniques. Therefore, this research determined the performance of classification techniques on changing data through a subset of schemes in a dataset. It was conducted using unbalanced and balanced phishing datasets, as well as subset schemes in ratios of 90:10, 80:20, 70:30, and 60:40. The thirteen most recent classification techniques used in preliminary phishing studies were compared and evaluated against ten performance measures. The results showed that the proposed schemes successfully uncover the maximum and minimum performance obtained by a classification technique. These comparisons can provide deeper insights into phishing classification techniques than related research.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords