IEEE Access (Jan 2021)

IT Infrastructure Anomaly Detection and Failure Handling: A Systematic Literature Review Focusing on Datasets, Log Preprocessing, Machine & Deep Learning Approaches and Automated Tool

  • Deepali Arun Bhanage,
  • Ambika Vishal Pawar,
  • Ketan Kotecha

DOI
https://doi.org/10.1109/ACCESS.2021.3128283
Journal volume & issue
Vol. 9
pp. 156392 – 156421

Abstract

Read online

Nowadays, reliability assurance is crucial in components of IT infrastructures. Unavailability of any element or connection results in downtime and triggers monetary and performance casualties. Thus, reliability engineering has been a topic of investigation recently. The system logs become obligatory in IT infrastructure monitoring for failure detection, root cause analysis, and troubleshooting. This Systematic Literature Review (SLR) focuses on detailed analysis based on the various qualitative and performance merits of datasets used, technical approaches utilized, and automated tools developed. The full-text review was directed by Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) methodology. 102 articles were extracted from Scopus, IEEE Explore, WoS, and ACM for a thorough examination. Also, a few more supplementary articles were studied by applying Snowballing technique. The study emphasizes the use of system logs for anomaly or failure detection and prediction. The survey encapsulates the automated tools under various quality merit criteria. This SLR ascertained that machine learning and deep learning-based classification approaches employed on selected features enable enhanced performance than traditional rule-based and method-based approaches. Additionally, the paper discusses research gaps in the existing literature and provides future research directions. The primary intent of this SLR is to perceive and inspect various tools and techniques proposed to mitigate IT infrastructure downtime in the existing literature. This survey will encourage prospective researchers to understand the pros and cons of current methods and pick an excellent approach to solve their identified problems in the field of IT infrastructure.

Keywords