International Journal of Information Science and Management (Jan 2024)
Intelligent Phishing Website Detection before and after Multiple Informative Feature Selection Techniques: Machine Learning Approach
Abstract
Individuals and Organizations that rely on the Internet for communication, collaboration, and daily tasks regularly encounter security and privacy issues unless interventions of intelligent Cybersecurity defense systems have been made to counter them. The existing pieces of evidence reveal that phishing website attacks have drastically increased despite the scientific communities' best efforts to combat them. Based on the key research gaps explored, the study has made significant attempts to answer the following research questions: RQ#1: Which cross-validation techniques and model optimization parameters are appropriate for given datasets and classifiers? RQ#2: Which Classifier(s) yielded a superior Accuracy, F1-Score, AUC-ROC, and MCC value with acceptable train-test computational time before and after applying the Informative Feature Selection Techniques? RQ#3: What are the strengths and weaknesses of each Classifier after being applied with multiple Informative Feature Selection Techniques? RQ#4: Could the results of the top-performed Classifier and Informative Feature Selection Technique on Dataset one (DS-1) be consistent on Dataset two (DS-2)? The study used a Google Co-Lab environment and Python Code to conduct rigorous experiments. Our experimental findings reveal that the CAT-B Classifier demonstrated a superior phishing website detection performance in terms of (Accuracy, F1-Score, AUC-ROC, and MCC value with acceptable train-test computational time both before and after applying the UFS Feature Selection Technique by scoring 0.9764 accuracies, 0.9762 F1-Score, 0.996 AUC-ROC, and 0.9528 MCC Value with 6 Seconds train-test computational time. The study practically demonstrated implementing the CAT-B-UFS technique using a Python Code so that upcoming researchers can easily replicate their results and learn more. In future work, the study proposed implementing deep learning algorithms with proper feature selection techniques on Individual and Hybrid approaches to obtain more promising results.
Keywords