IEEE Access (Jan 2020)
Research on a Distributed Processing Model Based on Kafka for Large-Scale Seismic Waveform Data
Abstract
For storage and recovery requirements on large-scale seismic waveform data of the National Earthquake Data Backup Center (NEDBC), a distributed cluster processing model based on Kafka message queues is designed to optimize the inbound efficiency of seismic waveform data stored in HBase at NEDBC. Firstly, compare the characteristics of big data storage architectures with that of traditional disk array storage architectures. Secondly, realize seismic waveform data analysis and periodic truncation, and write HBase in NoSQL record form through Spark Streaming cluster. Finally, compare and test the read/write performance of the data processing process of the proposed big data platform with that of traditional storage architectures. Results show that the seismic waveform data processing architecture based on Kafka designed and implemented in this paper has a higher read/write speed than the traditional architecture on the basis of the redundancy capability of NEDBC data backup, which verifies the validity and practicability of the proposed approach.
Keywords