The quality improvement method for detecting attacks on web applications using pre-trained natural language models

Kovaleva, Olga A.; Samokhvalov, Alexey Vladimirovich; Liashkov, Mikhail A.; Pchelintsev, Sergey Yurevich

doi:10.18500/1816-9791-2024-24-3-442-451

Известия Саратовского университета. Новая серия. Серия Математика. Механика. Информатика (Aug 2024)

The quality improvement method for detecting attacks on web applications using pre-trained natural language models

Kovaleva, Olga A.,
Samokhvalov, Alexey Vladimirovich,
Liashkov, Mikhail A.,
Pchelintsev, Sergey Yurevich

Affiliations

Kovaleva, Olga A.: Derzhavin Tambov State University,
Samokhvalov, Alexey Vladimirovich: Derzhavin Tambov State University,
Liashkov, Mikhail A.: Derzhavin Tambov State University,
Pchelintsev, Sergey Yurevich: Derzhavin Tambov State University,

DOI: https://doi.org/10.18500/1816-9791-2024-24-3-442-451
Journal volume & issue: Vol. 24, no. 3
pp. 442 – 451

Abstract

Read online

This paper explores the use of deep learning techniques to improve the performance of web application firewalls (WAFs), describes a specific method for improving the performance of web application firewalls, and presents the results of its testing on publicly available CSIC 2010 data. Most web application firewalls work on the basis of rules that have been compiled by experts. When running, firewalls inspect HTTP requests exchanged between client and server to detect attacks and block potential threats. Manual drafting of rules requires experts' time, and distributed ready-made rule sets do not take into account the specifics of particular user applications, therefore they allow many false positives and miss many network attacks. In recent years, the use of pretrained language models has led to significant improvements in a diverse set of natural language processing tasks as they are able to perform knowledge transfer. The article describes the adaptation of these approaches to the field of information security, i.e. the use of a pretrained language model as a feature extractor to match an HTTP request with a feature vector. These vectors are then used to train the classifier. We offer a solution that consists of two stages. In the first step, we create a deep pre-trained language model based on normal HTTP requests to the web application. In the second step, we use this model as a feature extractor and train a one-class classifier. Both steps are performed for each application. The experimental results show that the proposed approach significantly outperforms the classical Mod-Security approaches based on rules configured using OWASP CRS and does not require the involvement of a security expert to define trigger rules.

Published in Известия Саратовского университета. Новая серия. Серия Математика. Механика. Информатика

ISSN: 1816-9791 (Print); 2541-9005 (Online)
Publisher: Saratov State University
Country of publisher: Russian Federation
LCC subjects: Science: Mathematics
Website: https://mmi.sgu.ru/

About the journal

Abstract

Keywords