Jasmine: A new Active Learning approach to combat cybercrime

Jan Klein; Sandjai Bhulai; Mark Hoogendoorn; Rob van der Mei

Machine Learning with Applications (Sep 2022)

Jasmine: A new Active Learning approach to combat cybercrime

Jan Klein,
Sandjai Bhulai,
Mark Hoogendoorn,
Rob van der Mei

Affiliations

Jan Klein: Department of Stochastics, Centrum Wiskunde & Informatica, Science Park 123, 1098XG, Amsterdam, The Netherlands; Corresponding author.
Sandjai Bhulai: Department of Mathematics, Vrije Universiteit, De Boelelaan 1111, 1081HV, Amsterdam, The Netherlands
Mark Hoogendoorn: Department of Computer Science, Vrije Universiteit, De Boelelaan 1111, 1081HV, Amsterdam, The Netherlands
Rob van der Mei: Department of Stochastics, Centrum Wiskunde & Informatica, Science Park 123, 1098XG, Amsterdam, The Netherlands

Journal volume & issue: Vol. 9
p. 100351

Abstract

Read online

One of the reasons that the deployment of network intrusion detection methods falls short is the lack of realistic labeled datasets, which makes it challenging to develop and compare techniques. It is caused by the large amounts of effort that it takes for a cyber expert to classify network connections. This has raised the need for methods that learn from both labeled and unlabeled data which observations are best to present to the human expert. Hence, Active Learning (AL) methods are of interest.In this paper, we propose a new hybrid AL method called Jasmine. Firstly, it uses the uncertainty score and anomaly score to determine how suitable each observation is for querying, i.e., how likely it is to enhance classification. Secondly, Jasmine introduces dynamic updating. This allows the model to adjust the balance between querying uncertain, anomalous and randomly selected observations. To this end, Jasmine is able to learn the best query strategy during the labeling process. This is in contrast to the other AL methods in cybersecurity that all have static, predetermined query functions. We show that dynamic updating, and therefore Jasmine, is able to consistently obtain good and more robust results than querying only uncertainties, only anomalies or a fixed combination of the two.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords