IEEE Access (Jan 2023)
Simulation Research on Fast Matching of Big Data Based on Spark
Abstract
To solve the problem of low efficiency in real-time processing and matching of CNAME records in massive DNS log data, a parallel AC automaton enhancement method based on Spark was proposed. The method is based on the Spark distributed cluster computing engine of Hadoop, which ensures the stability of massive DNS log data storage with high fault tolerance and 24-hour real-time processing. At the same time, the Spark distributed cluster uses the multi-thread parallel computing method combined with the improved AC automaton algorithm, which not only reduces the memory occupied by trie construction, but also improves the efficiency of rapid matching of CNAME records of massive DNS logs. Simulation results show that the proposed method can quickly match CNAME records of massive DNS log data. Compared with the original AC algorithm, the efficiency is significantly improved, and the time complexity and storage space are reduced.
Keywords