Ain Shams Engineering Journal (Jul 2024)
EL-RFHC: Optimized ensemble learners using RFHC for intrusion attacks classification
Abstract
The extensive growth of mobile technology leads to magnifying the usage of digital gadgets around the world. This requires a fast-interconnecting communication medium to transfer the data between the devices. Meanwhile, the intruders attempt to make huge traffic in the network that leads to loss of data. To identify the intrusion attacks, ensemble Machine Learning (ML) classifiers are applied using the various feature variables importance. However, most of the transmitting data contains high dimensions with numerous variables leads to more execution time to classify the attacks. This study initiated the novel approach fusion of the Random Forest classifier and High Correlation (RFHC) feature selection approach to diminish the quantity of the variables. Also, the count of intrusion attacks class is lower than the normal class leads to generating an imbalanced dataset. Hence, Synthetic Minority Over-Sampling Technique (SMOTE) is suggested to create a balanced dataset for multi-class classification, and Un-upsampled data for binary-class classification respectively. The pre-processed dataset fed into the ensemble machine learners, and attention mechanism-based LSTM to classify as various intrusion attacks and normal data. This research work focused on reducing the CICIDS2017 dataset’s variable dimensions from 71 to 34 using RFHC. The performance results showed that RF classifier performed better with accuracy of 99.4 %, precision 99.4 %, average recall 99.2 % and average F1-score 99.6 % in binary-class classification, and Extreme Gradient Boosting (XGBoost) achieved better accuracy of 99.7 %, precision 98.7 %, average recall 99.5 % and average F1-score 99.2 % in multi-class classification.