Standard Latent Space Dimension for Network Intrusion Detection Systems Datasets

Ricardo Flores Moyano; Alejandro Duque; Daniel Riofrio; Noel Perez; Diego Benitez; Maria Baldeon-Calisto; David Fernandez

doi:10.1109/ACCESS.2023.3283567

IEEE Access (Jan 2023)

Standard Latent Space Dimension for Network Intrusion Detection Systems Datasets

Ricardo Flores Moyano,
Alejandro Duque,
Daniel Riofrio,
Noel Perez,
Diego Benitez,
Maria Baldeon-Calisto,
David Fernandez

Affiliations

Ricardo Flores Moyano: ORCiD; Colegio de Ciencias e Ingenierías “El Politécnico,”, Universidad San Francisco de Quito (USFQ), Quito, Ecuador
Alejandro Duque: ORCiD; Colegio de Ciencias e Ingenierías “El Politécnico,”, Universidad San Francisco de Quito (USFQ), Quito, Ecuador
Daniel Riofrio: ORCiD; Colegio de Ciencias e Ingenierías “El Politécnico,”, Universidad San Francisco de Quito (USFQ), Quito, Ecuador
Noel Perez: ORCiD; Colegio de Ciencias e Ingenierías “El Politécnico,”, Universidad San Francisco de Quito (USFQ), Quito, Ecuador
Diego Benitez: ORCiD; Colegio de Ciencias e Ingenierías “El Politécnico,”, Universidad San Francisco de Quito (USFQ), Quito, Ecuador
Maria Baldeon-Calisto: ORCiD; Colegio de Ciencias e Ingenierías “El Politécnico,”, Universidad San Francisco de Quito (USFQ), Quito, Ecuador
David Fernandez: ORCiD; Departamento de Ingeniería de Sistemas Telemáticos, Universidad Politécnica de Madrid, Madrid, Spain

DOI: https://doi.org/10.1109/ACCESS.2023.3283567
Journal volume & issue: Vol. 11
pp. 57240 – 57252

Abstract

Read online

Machine learning is a branch of artificial intelligence that provides computers the ability to create or improve algorithms without being explicitly programmed by directly learning from data. It is widely used in automation or decision-making tasks in fields such as image or speech recognition, sentiment analysis, or self-driving cars. However, its application in the field of communication networks is limited by the lack of appropriate research resources, such as rich datasets for training or the definition of a standard set of features. In this context, a standard latent space dimension is proposed by performing an autoencoder-based dimensionality reduction process. Different network security datasets are projected onto a lower-dimensional space to determine a standard or convergent dimension. The convergent dimension is determined by identifying the threshold above which diminishing returns begin to occur in the autoencoder loss as the latent space dimension increases. The experimental validation showed that four machine learning classification models, trained with a standard latent space of ten dimensions, performed as well as the models that used the non-reduced versions of the datasets in terms of F1-score and accuracy. Furthermore, a Wilcoxon statistical test showed that the mean accuracy of all classification models trained with the standard latent space dimension had a difference of less than 0.0235 in comparison to the models trained with the original inputs. A negligible difference in accuracy is a significant outcome because researchers can use only the latent space to perform experiments with certainty that the performance of ML models will not be constrained.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords