Scientific Reports (Aug 2024)

Large-scale IoT attack detection scheme based on LightGBM and feature selection using an improved salp swarm algorithm

  • Weizhe Chen,
  • Hongyu Yang,
  • Lihua Yin,
  • Xi Luo

DOI
https://doi.org/10.1038/s41598-024-69968-2
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 25

Abstract

Read online

Abstract Due to the swift advancement of the Internet of Things (IoT), there has been a significant surge in the quantity of interconnected IoT devices that send and exchange vital data across the network. Nevertheless, the frequency of attacks on the Internet of Things is steadily rising, posing a persistent risk to the security and privacy of IoT data. Therefore, it is crucial to develop a highly efficient method for detecting cyber threats on the Internet of Things. Nevertheless, several current network attack detection schemes encounter issues such as insufficient detection accuracy, the curse of dimensionality due to excessively high data dimensions, and the sluggish efficiency of complex models. Employing metaheuristic algorithms for feature selection in network data represents an effective strategy among the myriad of solutions. This study introduces a more comprehensive metaheuristic algorithm called GQBWSSA, which is an enhanced version of the Salp Swarm Algorithm with several strategy improvements. Utilizing this algorithm, a threshold voting-based feature selection framework is designed to obtain an optimized set of features. This procedure efficiently decreases the number of dimensions in the data, hence preventing the negative effects of having a high number of dimensions and effectively extracting the most significant and crucial information. Subsequently, the extracted feature data is combined with the LightGBM algorithm to form a lightweight and efficient ensemble learning scheme for IoT attack detection. The proposed enhanced metaheuristic algorithm has superior performance in feature selection compared to the recent metaheuristic algorithms, as evidenced by the experimental evaluation conducted using the NSLKDD and CICIoT2023 datasets. Compared to current popular ensemble learning solutions, the proposed overall solution exhibits excellent performance on multiple key indicators, including accuracy, precision, as well as training and detection time. Especially on the large-scale dataset CICIoT2023, the proposed scheme achieves an accuracy rate of 99.70% in binary classification and 99.41% in multi classification.

Keywords