Digital Communications and Networks (Feb 2023)

An ensemble deep learning model for cyber threat hunting in industrial internet of things

  • Abbas Yazdinejad,
  • Mostafa Kazemi,
  • Reza M. Parizi,
  • Ali Dehghantanha,
  • Hadis Karimipour

Journal volume & issue
Vol. 9, no. 1
pp. 101 – 110

Abstract

Read online

By the emergence of the fourth industrial revolution, interconnected devices and sensors generate large-scale, dynamic, and inharmonious data in Industrial Internet of Things (IIoT) platforms. Such vast heterogeneous data increase the challenges of security risks and data analysis procedures. As IIoT grows, cyber-attacks become more diverse and complex, making existing anomaly detection models less effective to operate. In this paper, an ensemble deep learning model that uses the benefits of the Long Short-Term Memory (LSTM) and the Auto-Encoder (AE) architecture to identify out-of-norm activities for cyber threat hunting in IIoT is proposed. In this model, the LSTM is applied to create a model on normal time series of data (past and present data) to learn normal data patterns and the important features of data are identified by AE to reduce data dimension. In addition, the imbalanced nature of IIoT datasets has not been considered in most of the previous literature, affecting low accuracy and performance. To solve this problem, the proposed model extracts new balanced data from the imbalanced datasets, and these new balanced data are fed into the deep LSTM AE anomaly detection model. In this paper, the proposed model is evaluated on two real IIoT datasets -Gas Pipeline (GP) and Secure Water Treatment (SWaT) that are imbalanced and consist of long-term and short-term dependency on data. The results are compared with conventional machine learning classifiers, Random Forest (RF), Multi-Layer Perceptron (MLP), Decision Tree (DT), and Super Vector Machines (SVM), in which higher performance in terms of accuracy is obtained, 99.3% and 99.7% based on GP and SWaT datasets, respectively. Moreover, the proposed ensemble model is compared with advanced related models, including Stacked Auto-Encoders (SAE), Naive Bayes (NB), Projective Adaptive Resonance Theory (PART), Convolutional Auto-Encoder (C-AE), and Package Signatures (PS) based LSTM (PS-LSTM) model.

Keywords