IEEE Access (Jan 2024)

Probabilistic Temporal Fusion Transformers for Large-Scale KPI Anomaly Detection

  • Haoran Luo,
  • Yongkun Zheng,
  • Kang Chen,
  • Shuo Zhao

DOI
https://doi.org/10.1109/ACCESS.2024.3353201
Journal volume & issue
Vol. 12
pp. 9123 – 9137

Abstract

Read online

This paper introduces a new generic and scalable framework for large-scale time series prediction and unsupervised anomaly detection. The most common approach of state-of-the-art time series anomaly detection techniques, which are mostly based on neural networks, is to train a network per time series. However, a typical modern microservice system consists of hundreds of active nodes/instances. To monitor the performance of such a system, we often need to keep track of thousands of time series describing different aspects of the system, including CPU usage, call latency, and workloads. We introduce a new methodology for grouping metrics that share the same type, predicting hundreds of metrics concurrently with a single neural network model with shared parameters. The model also integrates the probabilistic representations and Temporal Fusion Transformers for better performance. In a real-world dataset, our proposed model achieved up to 50% improvement in terms of MSE.

Keywords