KEYSTROKE DYNAMICS ANALYSIS USING MACHINE LEARNING METHODS

Nataliya SHABLIY; Serhii LUPENKO; Nadiia LUTSYK; Oleh YASNIY; Olha MALYSHEVSKA

doi:10.23743/acs-2021-30

Applied Computer Science (Dec 2021)

KEYSTROKE DYNAMICS ANALYSIS USING MACHINE LEARNING METHODS

Nataliya SHABLIY ,
Serhii LUPENKO ,
Nadiia LUTSYK ,
Oleh YASNIY ,
Olha MALYSHEVSKA

Affiliations

Nataliya SHABLIY: ORCiD; Ternopil Ivan Puluj National Technical University, Faculty of Computer Information Systems and Software Engineering, Computer Systems and Networks Department, Ternopil, Ukraine, [email protected]
Serhii LUPENKO: ORCiD; Ternopil Ivan Puluj National Technical University, Faculty of Computer Information Systems and Software Engineering, Computer Systems and Networks Department, Ternopil, Ukraine, [email protected]
Nadiia LUTSYK: ORCiD; Ternopil Ivan Puluj National Technical University, Faculty of Computer Information Systems and Software Engineering, Computer Systems and Networks Department, Ternopil, Ukraine, [email protected]
Oleh YASNIY: ORCiD; Ternopil Ivan Puluj National Technical University, Faculty of Computer Information Systems and Software Engineering, Computer Systems and Networks Department, Ternopil, Ukraine, [email protected]
Olha MALYSHEVSKA: ORCiD; Ivano-Frankivsk National Medical University, Department of Hygiene and Ecology, Ivano-Frankivsk, Ukraine, [email protected]

DOI: https://doi.org/10.23743/acs-2021-30
Journal volume & issue: Vol. 17, no. 4
pp. 75 – 83

Abstract

Read online

The primary objective of the paper was to determine the user based on its keystroke dynamics using the methods of machine learning. Such kind of a problem can be formulated as a classification task. To solve this task, four methods of supervised machine learning were employed, namely, logistic regression, support vector machines, random forest, and neural network. Each of three users typed the same word that had 7 symbols 600 times. The row of the dataset consists of 7 values that are the time period during which the particular key was pressed. The ground truth values are the user id. Before the application of machine learning classification methods, the features were transformed to z-score. The classification metrics were obtained for each applied method. The following parameters were determined: precision, recall, f1-score, support, prediction, and area under the receiver operating characteristic curve (AUC). The obtained AUC score was quite high. The lowest AUC score equal to 0.928 was achieved in the case of linear regression classifier. The highest AUC score was in the case of neural network classifier. The method of support vector machines and random forest showed slightly lower results as compared with neural network method. The same pattern is true for precision, recall and F1-score. Nevertheless, the obtained classification metrics are quite high in every case. Therefore, the methods of machine learning can be efficiently used to classify the user based on keystroke patterns. The most recommended method to solve such kind of a problem is neural network.

Published in Applied Computer Science

ISSN: 1895-3735 (Print); 2353-6977 (Online)
Publisher: Polish Association for Knowledge Promotion
Country of publisher: Poland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.acs.pollub.pl/

About the journal

Abstract

Keywords