Enhancing Malicious URL Detection: A Novel Framework Leveraging Priority Coefficient and Feature Evaluation

Ahmad Sahban Rafsanjani; Norshaliza Binti Kamaruddin; Mehran Behjati; Saad Aslam; Aaliya Sarfaraz; Angela Amphawan

doi:10.1109/ACCESS.2024.3412331

IEEE Access (Jan 2024)

Enhancing Malicious URL Detection: A Novel Framework Leveraging Priority Coefficient and Feature Evaluation

Ahmad Sahban Rafsanjani,
Norshaliza Binti Kamaruddin,
Mehran Behjati,
Saad Aslam,
Aaliya Sarfaraz,
Angela Amphawan

Affiliations

Ahmad Sahban Rafsanjani: ORCiD; School of Engineering and Technology, Sunway University, Bandar Sunway, Selangor Darul Ehsan, Malaysia
Norshaliza Binti Kamaruddin: ORCiD; Faculty of Artificial Intelligence, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
Mehran Behjati: ORCiD; School of Engineering and Technology, Sunway University, Bandar Sunway, Selangor Darul Ehsan, Malaysia
Saad Aslam: ORCiD; School of Engineering and Technology, Sunway University, Bandar Sunway, Selangor Darul Ehsan, Malaysia
Aaliya Sarfaraz: ORCiD; School of Engineering and Technology, Sunway University, Bandar Sunway, Selangor Darul Ehsan, Malaysia
Angela Amphawan: ORCiD; School of Engineering and Technology, Sunway University, Bandar Sunway, Selangor Darul Ehsan, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2024.3412331
Journal volume & issue: Vol. 12
pp. 85001 – 85026

Abstract

Read online

Malicious Uniform Resource Locators (URLs) pose a significant cybersecurity threat by carrying out attacks such as phishing and malware propagation. Conventional malicious URL detection methods, relying on blacklists and heuristics, often struggle to identify new and obfuscated malicious URLs. To address this challenge, machine learning and deep learning have been leveraged to enhance detection capabilities, albeit relying heavily on large and frequently updated datasets. Furthermore, the efficacy of these methods is intrinsically tied to the quality of the training data, a requirement that becomes increasingly challenging to fulfill in real-world scenarios due to constraints such as data scarcity and the dynamic nature of evolving cyber threats. In this study, we introduce an innovative framework for malicious URL detection based on predefined static feature classification by allocating priority coefficients and feature evaluation methods. Our feature classification encompasses 42 classes, including blacklist, lexical, host-based, and content-based features. To validate our framework, we collected a dataset of 5000 real-world URLs from prominent phishing and malware websites, namely URLhaus and PhishTank. We assessed our framework’s performance using three supervised machine learning methods: Support Vector Machine (SVM), Random Forest (RF), and Bayesian Network (BN). The results demonstrate that our framework outperforms these methods, achieving an impressive detection accuracy of 98.95% and a precision value of 98.60%. Furthermore, we conducted a benchmarking analysis against three comprehensive malicious URL detection methods (PDRCNN, the Li method, and URLNet), demonstrating that our proposed framework excels in terms of accuracy and precision. In conclusion, our novel malicious URL detection framework substantially enhances accuracy, significantly bolstering cybersecurity defenses against emerging threats.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords