Algorithms (May 2024)

Fault Location Method Based on Dynamic Operation and Maintenance Map and Common Alarm Points Analysis

  • Sheng Wu,
  • Jihong Guan

DOI
https://doi.org/10.3390/a17050217
Journal volume & issue
Vol. 17, no. 5
p. 217

Abstract

Read online

Under a distributed information system, the scale of various operational components such as applications, operating systems, databases, servers, and networks is immense, with intricate access relationships. The silo effect of each professional is prominent, and the linkage mechanism is insufficient, making it difficult to locate the infrastructure components that cause exceptions under a particular application. Current research only plays a role in local scenarios, and its accuracy and generalization are still very limited. This paper proposes a novel fault location method based on dynamic operation maps and alarm common point analysis. During the fault period, various alarm entities are associated with dynamic operation maps, and alarm common points are obtained based on graph search addressing methods, covering deployment relationship common points, connection common points (physical and logical), and access flow common points. This method, compared with knowledge graph approaches, eliminates the complex process of knowledge graph construction, making it more concise and efficient. Furthermore, in contrast to indicator correlation analysis methods, this approach supplements with configuration correlation information, resulting in more precise positioning. Through practical validation, its fault hit rate exceeds 82%, which is significantly better than the existing main methods.

Keywords