Journal of Cloud Computing: Advances, Systems and Applications (Apr 2023)

Container cascade fault detection based on spatial–temporal correlation in cloud environment

  • Ningjiang Chen,
  • Qingwei Zhong,
  • Yifei Liu,
  • Weitao Liu,
  • Lin Bai,
  • Liangqing Hu

DOI
https://doi.org/10.1186/s13677-023-00438-2
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 20

Abstract

Read online

Abstract Containers are light, numerous, and interdependent, which are prone to cascading fault, increasing the probability of fault and the difficulty of detection. Existing detection methods are usually based on a cascade fault model with traditional association analysis. The tradition model lacks consideration of the fault cascade history dimension and container space correlation dimension which results in a lower detection effect. And the imbalance of fault data in the cloud environment to the detection method to bring interference. Instead, this paper proposes a cascade fault detection method based on spatial–temporal correlation in cloud environment. First, the container cascade fault relationship model is constructed by extracting the spatial–temporal correlation from historical container faults. Second, based on dynamic feedback data sampling combined with ensemble learning, a container fault model learning method is designed to solve the imbalance of fault data. Then, a real-time container cascade fault detection mechanism for container cascade failure is proposed. The experimental results show that compared with the existing fault detection methods, the proposed method can effectively improve the detection accuracy, recall rate, and F1 value.

Keywords