Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments

Gutierrez-Portela Fernando; Almenares Mendoza Florina; Calderon-Benavides Liliana

doi:10.1109/ACCESS.2024.3516615

IEEE Access (Jan 2024)

Evaluation of the Performance of Unsupervised Learning Algorithms for Intrusion Detection in Unbalanced Data Environments

Gutierrez-Portela Fernando,
Almenares Mendoza Florina,
Calderon-Benavides Liliana

Affiliations

Gutierrez-Portela Fernando: ORCiD; Aqua Research Group, Cooperative University of Colombia, Ibagué, Colombia
Almenares Mendoza Florina: Department of Telematics Engineering, Universidad Carlos III de Madrid (UC3M), Madrid, Spain
Calderon-Benavides Liliana: ORCiD; Information Technologies Academic Unit, Autonomous University of Bucaramanga, Bucaramanga, Colombia

DOI: https://doi.org/10.1109/ACCESS.2024.3516615
Journal volume & issue: Vol. 12
pp. 190134 – 190157

Abstract

Read online

This study evaluated the performance of unsupervised machine learning algorithms for intrusion detection in unbalanced data environments using the BoT-IoT dataset. Algorithms such as K-means++, DBSCAN, Local Outlier Factor (LOF), and Isolation Forest (I-forest) were analyzed using metrics like purity, homogeneity, completeness, V-measure, and adjusted mutual information to assess their effectiveness in detecting attacks such as DDoS, DoS, and reconnaissance. Optimal cluster selection methods were also explored, and principal component analysis (PCA) was applied to explain data variability. Results showed that K-means++ achieved 95% purity with 95% and 99% prediction accuracies for normal and abnormal data, respectively, while I-forest delivered similar results and excelled in computational efficiency, consuming only 10% of CPU resources compared to 16% for other algorithms. These findings highlight I-forest’s effectiveness and efficiency in intrusion detection, offering a viable solution for cybersecurity environments with limited resources and significant data imbalance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords