Document similarity for error prediction

Péter Marjai; Péter Lehotay-Kéry; Attila Kiss

doi:10.1080/24751839.2021.1893496

Journal of Information and Telecommunication (Oct 2021)

Document similarity for error prediction

Péter Marjai,
Péter Lehotay-Kéry,
Attila Kiss

Affiliations

Péter Marjai: ELTE Eötvös Loránd University
Péter Lehotay-Kéry: ELTE Eötvös Loránd University
Attila Kiss: ELTE Eötvös Loránd University

DOI: https://doi.org/10.1080/24751839.2021.1893496
Journal volume & issue: Vol. 5, no. 4
pp. 407 – 420

Abstract

Read online

In today's rushing world, there's an ever-increasing usage of networking equipment. These devices log their operations; however, there could be errors that result in the restart of the given device. There could be different patterns before different errors. Our main goal is to predict the upcoming error based on the log lines of the actual file. To achieve this, we use document similarity. One of the key concepts of information retrieval is document similarity which is an indicator of how analogous (or different) documents are. In this paper, we are studying the effectiveness of prediction based on cosine similarity, Jaccard similarity, and Euclidean distance of rows before restarts. We use different features like TFIDF, Doc2Vec, LSH, and others in conjunction with these distance measures. Since networking devices produce lots of log files, we use Spark for Big data computing.

Published in Journal of Information and Telecommunication

ISSN: 2475-1839 (Print); 2475-1847 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.tandfonline.com/journals/tjit

About the journal

Abstract

Keywords