Learning From Few Cyber-Attacks: Addressing the Class Imbalance Problem in Machine Learning-Based Intrusion Detection in Software-Defined Networking

Seyed Mohammad Hadi Mirsadeghi; Hayretdin Bahsi; Risto Vaarandi; Wissem Inoubli

doi:10.1109/ACCESS.2023.3341755

IEEE Access (Jan 2023)

Learning From Few Cyber-Attacks: Addressing the Class Imbalance Problem in Machine Learning-Based Intrusion Detection in Software-Defined Networking

Seyed Mohammad Hadi Mirsadeghi,
Hayretdin Bahsi,
Risto Vaarandi,
Wissem Inoubli

Affiliations

Seyed Mohammad Hadi Mirsadeghi: ORCiD; Department of Software Science, Centre for Digital Forensics and Cyber Security, Tallinn University of Technology, Tallinn, Estonia
Hayretdin Bahsi: ORCiD; Department of Software Science, Centre for Digital Forensics and Cyber Security, Tallinn University of Technology, Tallinn, Estonia
Risto Vaarandi: ORCiD; Department of Software Science, Centre for Digital Forensics and Cyber Security, Tallinn University of Technology, Tallinn, Estonia
Wissem Inoubli: ORCiD; CNRS, UMR 8188, Centre de Recherche en Informatique de Lens (CRIL), Artois University, Lens, France

DOI: https://doi.org/10.1109/ACCESS.2023.3341755
Journal volume & issue: Vol. 11
pp. 140428 – 140442

Abstract

Read online

The class imbalance problem negatively impacts learning algorithms’ performance in minority classes which may constitute more severe attacks than the majority ones. This study investigates the benefits of balancing strategies and imbalanced learning approaches on intrusion data from Software Defined Networking (SDN). Although the research community has covered the imbalance problem in machine learning-based intrusion detection, addressing this problem in SDN is novel and powerful. Addressing the class imbalance problem over InSDN (the only publicly available SDN intrusion detection dataset as of recent) is of significant impact on future research in the area of intrusion detection in SDN. We address the class imbalance problem through data-level and classifier-level techniques. Our research objective is to determine suitable methods of addressing the class imbalance problem in machine learning-based intrusion detection in SDN. We propose custom deep learning architectures based on GANs and Siamese Neural Networks for generative modeling and similarity-based intrusion detection. This paper provides benchmarking results from classification with Random Oversampling (ROS), SMOTE, GANs, weighted Random Forest, and Siamese-based one-shot learning. We have found that Random Forest (RF) outperforms deep learning models in the classification of minority class instances. This supports the notion that RF can handle class imbalance well. We also observe that widely-used balancing techniques, ROS and SMOTE, drastically decrease the False Positive Rate (FPR) but increase the False Negative Rate (FNR) in the classification of minority classes. Conclusively, while data-level methods improve classification performance over deep learning models, they, in fact, degrade RF’s performance, i.e. cause higher numbers of false predictions. Therefore, RF does not need additional balancing strategies to get higher performance. Although this work addresses the class imbalance problem in SDN intrusion data, it provides a well-designed benchmark that can be exemplary for any network intrusion detection data. Thus, it may have a significant impact on future studies in this respective domain.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords