Tongxin xuebao (Apr 2022)

Research on a real-time receiving scheme of streaming data

  • Xiaoyan ZHANG,
  • Zhihao LIU,
  • Xiaofeng DU,
  • Tianbo LU

Journal volume & issue
Vol. 43
pp. 154 – 163

Abstract

Read online

Discussing the common scenarios in modern data warehouse systems that need to receive a large amount of streaming data, connect it with the existing data on the disk, and then store it in the warehouse.By rationally setting disk paging and applying cache modules to disperse the disk I/O pressure, a more efficient data receiving scheme was proposed based on the existing research, and a consistent Hash function was introduced and extended to distributed environment and a D-CACHEJOIN algorithm applied to distributed environment was proposed.The cost model of the algorithm was calculated by theory and simulation experiment was performed using data that obey the Zipfian distribution.The experiment results show that the proposed algorithm has higher efficiency than existing algorithms in practical application scenarios close to reality, and can be quickly and easily extended to distributed environments.

Keywords