Dianxin kexue (May 2024)

A method of building alarm causality graph for anomaly events in network services

  • ZHANG Lei,
  • JING Yuhan,
  • HE Bo,
  • QI Qi,
  • CHEN Chen,
  • WANG Jingyu

Journal volume & issue
Vol. 40
pp. 152 – 164

Abstract

Read online

In network service systems, the occurrence of anomaly events often leads to a large number of alarm events in the system, forming alarm storms. Operators need to spend a lot of time and effort searching for key information and identifying the root cause of anomaly events from these alarm data. In order to reduce the number of alarms that operators needed to handle, as well as automatically extracted the root alarms in the alarm storm, a method for generating an alarm causality graph based on the analysis of the propagation mode of network service alarms was proposed , and applied to extract key information of the alarm storm when anomaly events occurred. Real datasets of an operator's online network management system were used in experiments to verify the effect of building the alarm causal graph in extracting the alarm storm abstract. A real-world case was used to analyze the physical significance of this method. The results show that the recall rate of extracting alarm storm summary can reach 96% and the vast majority of key information is retained by using the method of alarm causality graph generation. In addition, the compression rate of alarms using this method can reach 66.5% for alarm codes that are difficult to compress.

Keywords