Simulation Research on Fast Matching of Big Data Based on Spark

Guojian Xu; Mingyang Song; Zhenggang Leng; Zhenhong Jia

doi:10.1109/ACCESS.2023.3262989

IEEE Access (Jan 2023)

Simulation Research on Fast Matching of Big Data Based on Spark

Guojian Xu,
Mingyang Song,
Zhenggang Leng,
Zhenhong Jia

Affiliations

Guojian Xu: College of Information Science and Engineering, Xinjiang University, Ürümqi, China
Mingyang Song: College of Information Science and Engineering, Xinjiang University, Ürümqi, China
Zhenggang Leng: College of Information Science and Engineering, Xinjiang University, Ürümqi, China
Zhenhong Jia: ORCiD; College of Information Science and Engineering, Xinjiang University, Ürümqi, China

DOI: https://doi.org/10.1109/ACCESS.2023.3262989
Journal volume & issue: Vol. 11
pp. 32628 – 32635

Abstract

Read online

To solve the problem of low efficiency in real-time processing and matching of CNAME records in massive DNS log data, a parallel AC automaton enhancement method based on Spark was proposed. The method is based on the Spark distributed cluster computing engine of Hadoop, which ensures the stability of massive DNS log data storage with high fault tolerance and 24-hour real-time processing. At the same time, the Spark distributed cluster uses the multi-thread parallel computing method combined with the improved AC automaton algorithm, which not only reduces the memory occupied by trie construction, but also improves the efficiency of rapid matching of CNAME records of massive DNS logs. Simulation results show that the proposed method can quickly match CNAME records of massive DNS log data. Compared with the original AC algorithm, the efficiency is significantly improved, and the time complexity and storage space are reduced.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords