Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

Mrutyunjaya Panda; Abd Allah A. Mousa; Aboul Ella Hassanien

doi:10.1109/ACCESS.2021.3092054

IEEE Access (Jan 2021)

Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks

Mrutyunjaya Panda,
Abd Allah A. Mousa,
Aboul Ella Hassanien

Affiliations

Mrutyunjaya Panda: ORCiD; Department of Computer Science and Applications, Utkal University, Odisha, India
Abd Allah A. Mousa: ORCiD; Department of Mathematics and Statistics, College of Science, Taif University, Taif, Saudi Arabia
Aboul Ella Hassanien: Scientific Research Group in Egypt (SRGE), Giza, Egypt

DOI: https://doi.org/10.1109/ACCESS.2021.3092054
Journal volume & issue: Vol. 9
pp. 91038 – 91052

Abstract

Read online

The proliferation of Internet of Things (IoT) systems and smart digital devices, has perceived them targeted by network attacks. Botnets are vectors buttoned up which the attackers grapple the control of IoT systems and comportment venomous activities. To confront this challenge, efficient machine learning and deep learning with suitable feature engineering are suggested to detect and protect the network from such vulnerabilities in the future. For the efficient detection of cyber attacks, the representative dataset shall be well-structured for training the model and then validating the proposed system to develop an optimal security model. In this research, we used the UNSW-NB15, a new IoT-Botnet dataset (a noisy and imbalanced dataset) to classify cyber-attacks. K-Medoid sampling and scatter search-based feature engineering techniques are used to obtain a representative dataset with optimal feature subsets. To validate the proposed methodologies, three most recent machine learning (ML) methods including (i) JChaid*- a recent upgrade version to Chi-square automatic interaction detection (CHAID) decision tree-based, (ii) A2DE (a semi-naive Bayesian averaged two-dependence estimator), & (iii) HGC- a hybrid of Genetic algorithm with K-means clustering and two deep learning (DL) methods such as (i) Deep Multilayer perceptron (DMLP) & (ii) Convolutional neural network (CNN) based classifiers are employed. From the extensive experimental analysis, it is pronounced that scatter search-based DMLP classifier outperforms the other competing models in terms of (i) highest detection rate with100% accuracy, 100% macro-averaged precision, 100% macro-averaged recall & 100% macro-averaged F1-score and (ii) low computational complexity with the least training time of 4.7 seconds & testing time of 0.61 seconds.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords