Results in Engineering (Dec 2024)

XSShield: A novel dataset and lightweight hybrid deep learning model for XSS attack detection

  • Gia-Huy Luu,
  • Minh-Khang Duong,
  • Trong-Phuc Pham-Ngo,
  • Thanh-Sang Ngo,
  • Dat-Thinh Nguyen,
  • Xuan-Ha Nguyen,
  • Kim-Hung Le

Journal volume & issue
Vol. 24
p. 103363

Abstract

Read online

With the proliferation of web applications, cross-site scripting (XSS) attacks have increased significantly and now pose a significant threat to users' information security and privacy. To enhance the efficiency of XSS attack detection, the adoption of machine learning (ML) and deep learning (DL) techniques offers promising solutions, but their effectiveness is limited by the lack of comprehensive and diverse datasets. Moreover, existing approaches often prioritize detection accuracy over real-time processing capabilities, which are essential for effective defense. To address these challenges, in this paper, we propose a novel framework that automatically collects web resources, efficiently extracts informative features, and constructs an up-to-date XSS attack dataset, which is then used to train a machine learning-based XSS detection model. Using this framework, we created and published a well-structured dataset over 100,000 samples for the research community. Furthermore, we present a hybrid detection model that leverages the strengths of both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. Extensive evaluations of our dataset demonstrate that the proposed model outperforms other baseline ML models across various metrics, including processing rate. Notably, our model achieves an accuracy of 99.27% while maintaining a low false positive rate of 0.06% and high processing rate of exceeding 1000 samples per second. These results highlight its high accuracy and robustness in detecting XSS, and suitability for real-time applications. Our work presents a comprehensive solution for enhancing web application security by providing a diverse dataset and a high-accuracy detection model with low latency.

Keywords