IEEE Access (Jan 2019)

A Survey of Distributed Data Stream Processing Frameworks

  • Haruna Isah,
  • Tariq Abughofa,
  • Sazia Mahfuz,
  • Dharmitha Ajerla,
  • Farhana Zulkernine,
  • Shahzad Khan

DOI
https://doi.org/10.1109/ACCESS.2019.2946884
Journal volume & issue
Vol. 7
pp. 154300 – 154316

Abstract

Read online

Big data processing systems are evolving to be more stream oriented where each data record is processed as it arrives by distributed and low-latency computational frameworks on a continuous basis. As the stream processing technology matures and more organizations invest in digital transformations, new applications of stream analytics will be identified and implemented across a wide spectrum of industries. One of the challenges in developing a streaming analytics infrastructure is the difficulty in selecting the right stream processing framework for the different use cases. With a view to addressing this issue, in this paper we present a taxonomy, a comparative study of distributed data stream processing and analytics frameworks, and a critical review of representative open source (Storm, Spark Streaming, Flink, Kafka Streams) and commercial (IBM Streams) distributed data stream processing frameworks. The study also reports our ongoing study on a multilevel streaming analytics architecture that can serve as a guide for organizations and individuals planning to implement a real-time data stream processing and analytics framework.

Keywords