Array (Mar 2024)

Towards efficient multi-granular anomaly detection in distributed systems

  • Chao Tu,
  • Ming Chen,
  • Liwen Zhang,
  • Long Zhao,
  • Di Wu,
  • Ziyang Yue

Journal volume & issue
Vol. 21
p. 100330

Abstract

Read online

Distributed systems often consist of a large number of computing and data nodes, which makes it both significant and challenging to detect anomalies efficiently and accurately in distributed systems. Generally, we not only need to determine whether an anomaly has occurred at a certain time (the time level anomaly), but also need to detect whether anomalies occur in a node (the node level anomaly) and which key performance indicators (KPIs) are anomalies (the KPI level anomaly), that is, to perform multi-granular anomaly detection in distributed systems. However, most existing algorithms only focus on the time level anomalies in centralized systems. For distributed systems, a simple way is to train a model for each node and then detect anomalies independently. An obvious disadvantage is that the cost of model inferring is unacceptable in practice. Therefore, we propose a Multi-Granular Anomaly Detection (MGAD) framework that utilizes a tree structure to perform anomaly detection hierarchically from the node level to time and KPI levels, which greatly reduces the cost of model inferring. Specifically, at the time level, we propose a novel model named Masked Sliding Spatial-Temporal Adversarial Network (MS2TAN) that considers spatial and temporal dependencies simultaneously. Extensive experiments with real-world data offer insights into the performance of the proposals, showing that MGAD is at least 5× faster for inferring when compared with the baselines.

Keywords