Improved Feature Selection and Stream Traffic Classification Based on Machine Learning in Software-Defined Networks

Arwa M. Eldhai; Mosab Hamdan; Ahmed Abdelaziz; Ibrahim Abaker Targio Hashem; Sharief F. Babiker; M. N. Marsono; Muzaffar Hamzah; Noor Zaman Jhanjhi

doi:10.1109/ACCESS.2024.3370435

IEEE Access (Jan 2024)

Improved Feature Selection and Stream Traffic Classification Based on Machine Learning in Software-Defined Networks

Arwa M. Eldhai,
Mosab Hamdan,
Ahmed Abdelaziz,
Ibrahim Abaker Targio Hashem,
Sharief F. Babiker,
M. N. Marsono,
Muzaffar Hamzah,
Noor Zaman Jhanjhi

Affiliations

Arwa M. Eldhai: ORCiD; Faculty of Engineering, University of Science and Technology, Khartoum, Sudan
Mosab Hamdan: ORCiD; Interdisciplinary Research Center for Intelligent Secure Systems, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Ahmed Abdelaziz: ORCiD; Information Technology Department, Khawarizmi International College, Abu Dhabi, United Arab Emirates
Ibrahim Abaker Targio Hashem: ORCiD; Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, United Arab Emirates
Sharief F. Babiker: Faculty of Electrical and Electronic Engineering, University of Khartoum, Khartoum, Sudan
M. N. Marsono: Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
Muzaffar Hamzah: ORCiD; Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia
Noor Zaman Jhanjhi: ORCiD; School of Computer Science (SCS), Taylor’s University, Subang Jaya, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2024.3370435
Journal volume & issue: Vol. 12
pp. 34141 – 34159

Abstract

Read online

Traffic classification (TC) in software-defined networks (SDN) using machine learning (ML) appears to be a viable option for improving network management. TC improves SDN operability, while SDN speeds up the feature selection (FS) process, especially when ML is used as a classification mechanism to extract measurements and related information from incoming data to the SDN controller. Despite these advantages, there is still a lack of adequate support for TC and FS tasks due to the frequent similarity of traffic profiles, making classification difficult. Furthermore, when combined with TC, stream learning (SL) poses numerous challenges. As a result, robust statistical flow features are needed to reduce the overhead of the SDN control plane. As a result, these statistical flow features could extract online features, handle concept drift, and process an infinite data stream using limited resources (time and memory). This paper aims to improve the overall performance of TC using the SL technique to select relevant FS to alleviate load from the SDN control plane by doing the following. First, an FS mechanism called Boruta is proposed. Second, we propose three streaming-based TC methods for SDN: Hoeffding adaptive trees (HAT), adaptive random forest (ARF), and k-nearest neighbour with adaptive sliding window detector (KNN-ADWIN). These techniques can dynamically handle the concept drift and solve the problem of memory and time consumption, lowering the overhead of the SDN controller. Third, real and synthetic traffic traces are used to evaluate the proposed FS and streaming TC performance. According to simulation results, the Boruta FS technique can achieve up to 95% average accuracy and up to 87% average per application in terms of precision, recall, and f-score, outperforming other works in the literature. Furthermore, results for SL techniques show that the proposed methods can maintain up to 85% average accuracy, 78% kappa, and average rates of 62-88% in precision, recall, and f-score. In addition, when compared to ART and KNN-ADWIN, the HAT consumes less time and memory (15s and 105KB, respectively).

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords