Jisuanji kexue (Jan 2023)

Fast Storage System for Time-series Big Data Streams Based on Waterwheel Model

  • LU Mingchen, LYU Yanqi, LIU Ruicheng, JIN Peiquan

DOI
https://doi.org/10.11896/jsjkx.220900045
Journal volume & issue
Vol. 50, no. 1
pp. 25 – 33

Abstract

Read online

With the rapid development of the Internet of Things,the scale of sensor deployment has been growing in recent years.Large-scale sensors generate massive streaming data every second,and the value of the data decreases over time.Therefore,the storage system needs to be able to withstand the write pressure brought by the high-speed arriving streaming data and persist the data as fast as possible for subsequent query and analysis.This poses a considerable challenge to the write performance of the storage system.The fast storage system based on the waterwheel model can meet the fast storage requirements of high-speed time-series data streams in big data application scenarios.The proposed system is deployed between high-speed streaming data and underlying storage nodes,using multiple data buckets to build a logically rotating storage model(similar to the ancient Chinese waterwheel),and coordinating data writing and persisting by controlling the state of each data bucket.Waterwheel sends data buckets to different underlying storage nodes,so that the instantaneous write pressure is evenly distributed to multiple underlying storage nodes,and the write throughput is improved with the help of multi-node parallel writing.The waterwheel model is deployed on a stand-alone version of MongoDB,and compared with the distributed MongoDB in experiments.The results show that the proposed system can effectively improve the write throughput of the system,reduce the write latency,and has good horizontal scalability.

Keywords