Detecting cybersecurity attacks across different network features and learners

Joffrey L. Leevy; John Hancock; Richard Zuech; Taghi M. Khoshgoftaar

doi:10.1186/s40537-021-00426-w

Journal of Big Data (Feb 2021)

Detecting cybersecurity attacks across different network features and learners

Joffrey L. Leevy,
John Hancock,
Richard Zuech,
Taghi M. Khoshgoftaar

Affiliations

Joffrey L. Leevy: Florida Atlantic University
John Hancock: Florida Atlantic University
Richard Zuech: Florida Atlantic University
Taghi M. Khoshgoftaar: Florida Atlantic University

DOI: https://doi.org/10.1186/s40537-021-00426-w
Journal volume & issue: Vol. 8, no. 1
pp. 1 – 29

Abstract

Read online

Abstract Machine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords