IEEE Access (Jan 2024)

Multi-Modal Comparative Analysis on Execution of Phishing Detection Using Artificial Intelligence

  • Divya Jennifer Dsouza,
  • Anisha P. Rodrigues,
  • Roshan Fernandes

DOI
https://doi.org/10.1109/ACCESS.2024.3491429
Journal volume & issue
Vol. 12
pp. 163016 – 163041

Abstract

Read online

Phishing is the process of deceiving or stealing private or confidential information through illicit means. This could lead to financial loss, loss of reputation, and identity theft. Hence, identifying and preventing the use of such phishing sites becomes crucial. In data science, the term outlier, also termed an anomaly refers to points or series of data that opt out of the normal behaviour of the system under study. Anomaly detection touches down on the concepts related to studying the authentic outlier in a data set. The paper aims to present the optimised techniques and multiple modes of executing the process for detecting phishing websites. The most relevant features are chosen for execution by applying feature extraction. The Mendley phishing websites dataset is used to detect phishing websites, along with the SPAM-HAM publicly available dataset, which is used for detecting SPAM/HAM classification for SMS data in this research study. The experiments are also carried out on a custom dataset to avoid any bias present in a publicly available dataset. The study is carried out in three modes, namely offline, batch, and incremental, using machine learning models. The performance evaluation metrics such as accuracy, f1 score, precision, recall, and time complexity of the machine learning models and accuracy and loss metrics of the deep learning models are compared between the different modes. The study is then summarised by detailing the pros and cons of each of the modes and models used for the study. The incremental mode of execution suits better for real-time processing, with an accuracy of 97.1% on the custom dataset using the adaptive random forest (ARF) classifier available in the Python’s River Framework. But if we make use of the deep learning approach with Keras sequential model, the accuracy obtained was 99.28%.

Keywords