IEEE Access (Jan 2024)

Building a Cloud-IDS by Hybrid Bio-Inspired Feature Selection Algorithms Along With Random Forest Model

  • Mhamad Bakro,
  • Rakesh Ranjan Kumar,
  • Mohammad Husain,
  • Zubair Ashraf,
  • Arshad Ali,
  • Syed Irfan Yaqoob,
  • Mohammad Nadeem Ahmed,
  • Nikhat Parveen

DOI
https://doi.org/10.1109/ACCESS.2024.3353055
Journal volume & issue
Vol. 12
pp. 8846 – 8874

Abstract

Read online

The adoption of cloud computing has become increasingly widespread across various domains. However, the inherent security vulnerabilities of cloud computing pose significant risks to its overall safety. Consequently, intrusion detection systems (IDS) play a pivotal role in identifying malicious activities within a cloud system. The considerable volume of network traffic data may contain redundant and irrelevant features that can impact the classification performance of the classifier. In addition, the complexity and time consumption increase while processing such a substantial volume of data in the cloud intrusion detection process. To enhance the performance of the IDS, this study proposes a hybrid feature selection approach, combining two bio-inspired algorithms, namely the grasshopper optimization algorithm (GOA) and the genetic algorithm (GA). The combination of these two algorithms ensures a more efficient search for optimal solutions. A random forest (RF) classifier is trained using those optimal features. Moreover, the proposal addresses the challenge of imbalanced data by employing a hybrid approach: over-sampling the minority classes using an adaptive synthetic (ADASYN) algorithm, while implementing random under-sampling (RUS) for the majority class as needed. This integrated strategy significantly influences each category, enhancing the true positive rate (TPR) while minimizing the false positive rate (FPR), thus improving the overall system performance. The proposed approach was evaluated using three datasets: UNSW-NB15, CIC-DDoS2019, and CIC Bell DNS EXF 2021. The recorded accuracies for these datasets were 98%, 99%, and 92%, respectively. The hybrid feature selection-based IDS demonstrated superior performance in multi-class classification, along with exemplary results for individual classes within the datasets. The proposed strategy exhibited a marked superiority with the random forest classifier, especially when compared to other classifiers including SVM, LR, FLN, LSTM, AlexNet, DNN, DBN, DT, and XGBoost. Moreover, this performance remained consistent and commendable even when benchmarked against contemporary state-of-the-art methodologies across multiple evaluation metrics.

Keywords