大数据 (May 2020)
Survey on data caching technology of distributed dataflow system
Abstract
Dataflow model is adopted by several dataflow systems for its advantages of high parallel computing,pipeline processing and functional programming.In distributed dataflow systems and heterogeneous dataflow systems,due to the speed mismatch between the data production of data source operators and the data consumption of data sink operators,data could be delayed and operators could be idle.In order to support an efficient dataflow system,a dataflow cache system was desired to ensure efficient caching and movement of dataflow.Several distributed dataflow systems and distributed message queuing systems were analyzed,and the support degree of current message queuing system to data flow caching system was summarized.Finally,the cache technique was introduced,and the demands and research directions of future dataflow caching systems were analyzed.