IEEE Access (Jan 2023)

PatCluster: A Top-Down Log Parsing Method Based on Frequent Words

  • Yu Bai,
  • Yongwei Chi,
  • Dan Zhao

DOI
https://doi.org/10.1109/ACCESS.2023.3239012
Journal volume & issue
Vol. 11
pp. 8275 – 8282

Abstract

Read online

Logs are a combination of static message type fields and dynamic variable fields, and the accuracy of log parsing affects the result of subsequent log analysis tasks. In this regard, an offline log parsing method based on frequent words is introduced: PatCluster. This method first generates root nodes by preprocessing; secondly, the frequency of words is counted, and the word with the largest frequency is extracted as the segmentation condition to refine the template generated by the root node. So on recursively, pattern nodes are formed for all elements of the nodes, and corresponding templates are generated to finally achieve the purpose of log pattern mining. The mining process of the log patterns is from coarse to fine which is based on fewer assumptions, and the pattern fitting depth can be controlled by adjusting the termination condition. In optimized algorithm model, we also consider the maximum extent of the log template matching the token in the log message. The experimental results show that this method effectively improves the log parsing quality and has higher log parsing accuracy than other methods, and is more suitable for handling logs with complex structures.

Keywords