Detecting phishing websites through improving convolutional neural networks with Self-Attention mechanism

Yahia Said; Ahmed A. Alsheikhy; Husam Lahza; Tawfeeq Shawly

Ain Shams Engineering Journal (Apr 2024)

Detecting phishing websites through improving convolutional neural networks with Self-Attention mechanism

Yahia Said,
Ahmed A. Alsheikhy,
Husam Lahza,
Tawfeeq Shawly

Affiliations

Yahia Said: Electrical Engineering Department, College of Engineering, Northern Border University, Arar, Saudi Arabia; Laboratory of Electronics and Microelectronics (LR99ES30), University of Monastir, Tunisia
Ahmed A. Alsheikhy: Electrical Engineering Department, College of Engineering, Northern Border University, Arar, Saudi Arabia
Husam Lahza: Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia; Corresponding author.
Tawfeeq Shawly: Electrical Engineering Department, Faculty of Engineering at Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia

Journal volume & issue: Vol. 15, no. 4
p. 102643

Abstract

Read online

Emerging technologies have made internet connection a vital activity facilitating access to many services. However, internet connection raises many security concerns, such as illegally acquiring private information, passwords, and identifiers. Phishing websites are the first choice for attackers that try to have users' private space. Social engineering attacks are performed by designing fake websites similar to real ones and inviting the victim to access those websites to collect their sensitive information and then redirect them to the actual site. Due to the importance of detecting phishing websites, building a robust detector that filters them and blocks their activity on the Internet is necessary. In this paper, we proposed a phishing website detector based on improving the convolutional neural network (CNN) with a self-attention mechanism. The proposed detector collects phishing Uniform Resource Locators (URLs) by treating them as strings. CNN models have proved their efficiency when dealing with text strings compared to Long Short-Term Memory (LSTM) which focuses on temporal features. Using CNN allows learning comprehensive features of the URLs and facilitates the detection of phishing ones. The self-attention mechanism was added to enhance the model's focus and detection accuracy. Besides, the training dataset was balanced by generating phishing URLs using a Generative Adversarial Network (GAN). A set of experiments has proved the robustness of the proposed detector by achieving high detection accuracy on the test set. Besides, the proposed detector was tested using unknown URLs and achieved excellent results. The improved CNN's detection precision of 99.7 is higher than the regular CNN model by 2.74%. The reported results show that using the self-attention mechanism has improved the detection accuracy and made the CNN model more efficient for detecting phishing websites.

Published in Ain Shams Engineering Journal

ISSN: 2090-4479 (Print); 2090-4495 (Online)
Publisher: Elsevier
Country of publisher: Egypt
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.journals.elsevier.com/ain-shams-engineering-journal/

About the journal

Abstract

Keywords