IoT information theft prediction using ensemble feature selection

Joffrey L. Leevy; John Hancock; Taghi M. Khoshgoftaar; Jared M. Peterson

doi:10.1186/s40537-021-00558-z

Journal of Big Data (Jan 2022)

IoT information theft prediction using ensemble feature selection

Joffrey L. Leevy,
John Hancock,
Taghi M. Khoshgoftaar,
Jared M. Peterson

Affiliations

Joffrey L. Leevy: Florida Atlantic University
John Hancock: Florida Atlantic University
Taghi M. Khoshgoftaar: Florida Atlantic University
Jared M. Peterson: Florida Atlantic University

DOI: https://doi.org/10.1186/s40537-021-00558-z
Journal volume & issue: Vol. 9, no. 1
pp. 1 – 48

Abstract

Read online

Abstract The recent years have seen a proliferation of Internet of Things (IoT) devices and an associated security risk from an increasing volume of malicious traffic worldwide. For this reason, datasets such as Bot-IoT were created to train machine learning classifiers to identify attack traffic in IoT networks. In this study, we build predictive models with Bot-IoT to detect attacks represented by dataset instances from the Information Theft category, as well as dataset instances from the data exfiltration and keylogging subcategories. Our contribution is centered on the evaluation of ensemble feature selection techniques (FSTs) on classification performance for these specific attack instances. A group or ensemble of FSTs will often perform better than the best individual technique. The classifiers that we use are a diverse set of four ensemble learners (Light GBM, CatBoost, XGBoost, and random forest (RF)) and four non-ensemble learners (logistic regression (LR), decision tree (DT), Naive Bayes (NB), and a multi-layer perceptron (MLP)). The metrics used for evaluating classification performance are area under the receiver operating characteristic curve (AUC) and Area Under the precision-recall curve (AUPRC). For the most part, we determined that our ensemble FSTs do not affect classification performance but are beneficial because feature reduction eases computational burden and provides insight through improved data visualization.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords