Journal of King Saud University: Computer and Information Sciences (Mar 2023)

An empirical assessment of ensemble methods and traditional machine learning techniques for web-based attack detection in industry 5.0

  • Oumaima Chakir,
  • Abdeslam Rehaimi,
  • Yassine Sadqi,
  • El Arbi Abdellaoui Alaoui,
  • Moez Krichen,
  • Gurjot Singh Gaba,
  • Andrei Gurtov

Journal volume & issue
Vol. 35, no. 3
pp. 103 – 119

Abstract

Read online

Cybersecurity attacks that target software have become profitable and popular targets for cybercriminals who consciously take advantage of web-based vulnerabilities and execute attacks that might jeopardize essential industry 5.0 features. Several machine learning-based techniques have been developed in the literature to identify these types of assaults. In contrast to single classifiers, ensemble methods have not been evaluated empirically. To the best of our knowledge, this work is the first empirical evaluation of both homogeneous and heterogeneous ensemble approaches compared to single classifiers for web-based attack detection in industry 5.0, utilizing two of the most realistic public web-based attack datasets. The authors divided the experiment into three main phases: In the first phase, they evaluated the performance of five well-established supervised machine learning (ML) classifiers. In the second phase, they constructed a heterogeneous ensemble of the three best-performing ML algorithms using max voting and stacking methods. In the third phase, they used four well-known homogeneous ensembles to evaluate the performance of the bagging and boosting method. The results based on the ECML/PKDD 2007 and CSIC HTTP 2010 datasets revealed that bagging, particularly Random Forest, outperformed single classifiers in terms of accuracy, precision, F-value, FPR, and area of the ROC curve with values of 99.597%, 98.274%, 99.129%, 0.523%, 100 and 99.867%, 99.867%, 99.867%, 0.267%, 100, respectively. In contrast, single classifiers performed better than boosting and stacking. However, in terms of FPR, the boosting exceeded single classifiers. Max voting is appropriate when accuracy, precision, and FPR are the primary concerns, whereas single classifiers can be employed when recall, FNR, training, and prediction times are critical elements. In terms of training time, ensemble approaches are more likely to be affected by data volume than single classifiers. The paper’s findings will help security researchers and practitioners identify the most efficient learning techniques for securing web applications.

Keywords