Anomaly Detection Algorithms for Real-Time Log Data Analysis at Scale

Andras Horvath; Andras Olah; Attila Pinter; Balint Siklosi; Gergely Lukacs; Istvan Z. Reguly; Kalman Tornai; Tamas Zsedrovits; Zoltan Mathe

doi:10.1109/access.2025.3594469

IEEE Access (Jan 2025)

Anomaly Detection Algorithms for Real-Time Log Data Analysis at Scale

Andras Horvath,
Andras Olah,
Attila Pinter,
Balint Siklosi,
Gergely Lukacs,
Istvan Z. Reguly,
Kalman Tornai,
Tamas Zsedrovits,
Zoltan Mathe

Affiliations

Andras Horvath: ORCiD; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Andras Olah: ORCiD; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Attila Pinter: ORCiD; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Balint Siklosi: ORCiD; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Gergely Lukacs: Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Istvan Z. Reguly: ORCiD; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Kalman Tornai: ORCiD; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Tamas Zsedrovits: Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Zoltan Mathe: ORCiD; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary

DOI: https://doi.org/10.1109/access.2025.3594469
Journal volume & issue: Vol. 13
pp. 136288 – 136311

Abstract

Read online

In recent years, Artificial Intelligence for IT Operations (AIOps) has gained popularity as a solution to various challenges in IT operations, particularly in anomaly detection. Although numerous studies have focused on anomaly detection, they often overlook cloud-based systems and the vast amount of data they generate. Moreover, the practical application of these results and managing multiple IT systems within diverse computing environments present notable challenges. In this paper, we explore anomaly detection in real-world environments, including legacy systems and cloud platforms, by evaluating existing methods and explicitly introducing four novel anomaly detection algorithms: Llama Model-based Log Anomaly Detection (LMLAD), Log Clustering with Cosine Similarity for Log Anomaly Detection (LCCLAD), Convolutional Auto-Encoder based Log Anomaly Detection (CAELAD), and Statistical Clustering for Log Anomaly Detection (SCLAD). These algorithms vary in complexity and accuracy, and offer tunable performance based on the specific requirements of the environment. In experiments on four log datasets (two private and two public), our best models improved F1-scores by 6–10% over state-of-the-art baselines, demonstrating the efficacy of our approach. Through empirical analysis, we demonstrate that while advanced techniques such as CAELAD often deliver high accuracy, simpler methods such as SCLAD can be equally effective in practical settings, depending on the complexity of the problem. Our results underscore the importance of selecting the right balance between sophistication and simplicity, challenging the assumption that the most sophisticated methods are necessary for effective anomaly detection in real-world log data.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords