Jisuanji kexue yu tansuo (Oct 2022)

Research on Method of Log Pattern Extracting in High-Performance Computing Environment

  • WANG Xiaodong, ZHAO Yining, XIAO Haili, WANG Xiaoning, CHI Xuebin

DOI
https://doi.org/10.3778/j.issn.1673-9418.2103066
Journal volume & issue
Vol. 16, no. 10
pp. 2264 – 2272

Abstract

Read online

Log analysis plays an important role in the stable operation of computer system. However, logs are usua-lly unstructured, which is not conducive to automatic analysis. How to categorize logs and turn them into structured data automatically is of great practical significance. In this paper, LDmatch algorithm is proposed, which imple-ments a log pattern extracting algorithm based on word matching rate. Traditional log matching algorithms use one-to-one word matching method in similarity calculation, while the proposed LDmatch algorithm calculates the simi-larity between logs according to the longest common subsequence (LCS) of words contained in two logs, and classi-fies logs based on the LCS. LDmatch algorithm can also get real-time log template and update. In addition, the pat-tern warehouse of the algorithm uses a data structure based on hash table for storage, which refines the classification of logs and reduces the times of comparison during log matching, thus improving the matching efficiency of the algorithm. In order to verify the advantages of the algorithm, it is applied to the open source data set and the actual log data set generated by the CNGrid. A variety of other log pattern extraction algorithms are used for comparison and experimental results are obtained. Finally, the advantages of the algorithm in accuracy, robustness and efficiency are proven.

Keywords