IEEE Access (Jan 2021)

Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

  • Mrutyunjaya Panda,
  • Abd Allah A. Mousa,
  • Aboul Ella Hassanien

DOI
https://doi.org/10.1109/ACCESS.2021.3092054
Journal volume & issue
Vol. 9
pp. 91038 – 91052

Abstract

Read online

The proliferation of Internet of Things (IoT) systems and smart digital devices, has perceived them targeted by network attacks. Botnets are vectors buttoned up which the attackers grapple the control of IoT systems and comportment venomous activities. To confront this challenge, efficient machine learning and deep learning with suitable feature engineering are suggested to detect and protect the network from such vulnerabilities in the future. For the efficient detection of cyber attacks, the representative dataset shall be well-structured for training the model and then validating the proposed system to develop an optimal security model. In this research, we used the UNSW-NB15, a new IoT-Botnet dataset (a noisy and imbalanced dataset) to classify cyber-attacks. K-Medoid sampling and scatter search-based feature engineering techniques are used to obtain a representative dataset with optimal feature subsets. To validate the proposed methodologies, three most recent machine learning (ML) methods including (i) JChaid*- a recent upgrade version to Chi-square automatic interaction detection (CHAID) decision tree-based, (ii) A2DE (a semi-naive Bayesian averaged two-dependence estimator), & (iii) HGC- a hybrid of Genetic algorithm with K-means clustering and two deep learning (DL) methods such as (i) Deep Multilayer perceptron (DMLP) & (ii) Convolutional neural network (CNN) based classifiers are employed. From the extensive experimental analysis, it is pronounced that scatter search-based DMLP classifier outperforms the other competing models in terms of (i) highest detection rate with100% accuracy, 100% macro-averaged precision, 100% macro-averaged recall & 100% macro-averaged F1-score and (ii) low computational complexity with the least training time of 4.7 seconds & testing time of 0.61 seconds.

Keywords