Journal of Cloud Computing: Advances, Systems and Applications (May 2024)

MTG_CD: Multi-scale learnable transformation graph for fault classification and diagnosis in microservices

  • Juan Chen,
  • Rui Zhang,
  • Peng Chen,
  • Jianhua Ren,
  • Zongling Wu,
  • Yang Wang,
  • Xi Li,
  • Ling Xiong

DOI
https://doi.org/10.1186/s13677-024-00666-0
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 15

Abstract

Read online

Abstract The rapid advancement of microservice architecture in the cloud has led to the necessity of effectively detecting, classifying, and diagnosing run failures in microservice applications. Due to the high dynamics of cloud environments and the complex dependencies between microservices, it is challenging to achieve robust real-time system fault identification. This paper proposes an interpretable fault diagnosis framework tailored for microservice architecture, namely Multi-scale Learnable Transformation Graph for Fault Classification and Diagnosis(MTG_CD). Firstly, we employ multi-scale neural transformation and graph structure adjacency matrix learning to enhance data diversity while extracting temporal-structural features from system monitoring metrics Secondly, a graph convolutional network (GCN) is utilized to fuse the extracted temporal-structural features in a multi-feature modeling approach, which helps to improve the accuracy of anomaly detection. To identify the root cause of system faults, we finally conduct a coarse-grained level diagnosis and exploration after obtaining the results of classifying the fault data. We evaluate the performance of MTG_CD on the microservice benchmark SockShop, demonstrating its superiority over several baseline methods in detecting CPU usage overhead, memory leak, and network delay faults. The average macro F1 score improves by 14.05%.

Keywords