Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift

M. V. Carriegos; N. DeCastro-García; D. Escudero

doi:10.1155/2023/5780357

Computational and Mathematical Methods (Jan 2023)

Towards Supercomputing Categorizing the Maliciousness upon Cybersecurity Blacklists with Concept Drift

M. V. Carriegos,
N. DeCastro-García,
D. Escudero

Affiliations

M. V. Carriegos: Departamento de Matemáticas
N. DeCastro-García: Departamento de Matemáticas
D. Escudero: RIASC

DOI: https://doi.org/10.1155/2023/5780357
Journal volume & issue: Vol. 2023

Abstract

Read online

In this article, we have carried out a case study to optimize the classification of the maliciousness of cybersecurity events by IP addresses using machine learning techniques. The optimization is studied focusing on time complexity. Firstly, we have used the extreme gradient boosting model, and secondly, we have parallelized the machine learning algorithm to study the effect of using a different number of cores for the problem. We have classified the cybersecurity events’ maliciousness in a biclass and a multiclass scenario. All the experiments have been carried out with a well-known optimal set of features: the geolocation information of the IP address. However, the geolocation features of an IP address can change over time. Also, the relation between the IP address and its label of maliciousness can be modified if we test the address several times. Then, the models’ performance could degrade because the information acquired from training on past samples may not generalize well to new samples. This situation is known as concept drift. For this reason, it is necessary to study if the optimization proposed works in a concept drift scenario. The results show that the concept drift does not degrade the models. Also, boosting algorithms achieving competitive or better performance compared to similar research works for the biclass scenario and an effective categorization for the multiclass case. The best efficient setting is reached using five nodes regarding high-performance computation resources.

Published in Computational and Mathematical Methods

ISSN: 2577-7408 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics; Technology: Technology (General): Industrial engineering. Management engineering: Applied mathematics. Quantitative methods
Website: https://onlinelibrary.wiley.com/journal/cmm

About the journal