Defining the execution semantics of stream processing engines

Lorenzo Affetti; Riccardo Tommasini; Alessandro Margara; Gianpaolo Cugola; Emanuele Della Valle

doi:10.1186/s40537-017-0072-9

Journal of Big Data (Apr 2017)

Defining the execution semantics of stream processing engines

Lorenzo Affetti,
Riccardo Tommasini,
Alessandro Margara,
Gianpaolo Cugola,
Emanuele Della Valle

Affiliations

Lorenzo Affetti: Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, DEIB
Riccardo Tommasini: Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, DEIB
Alessandro Margara: Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, DEIB
Gianpaolo Cugola: Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, DEIB
Emanuele Della Valle: Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, DEIB

DOI: https://doi.org/10.1186/s40537-017-0072-9
Journal volume & issue: Vol. 4, no. 1
pp. 1 – 24

Abstract

Read online

Abstract The ability to process large volumes of data on the fly, as soon as they become available, is a fundamental requirement in today’s information systems. Modern distributed stream processing engines (SPEs) address this requirement and provide low-latency and high-throughput data stream processing in cluster platforms, offering high-level programming interfaces that abstract from low-level details such as data distribution and hardware failures. The last decade saw a rapid increase in the number of available SPEs. However, each SPE defines its own processing model and standardized execution semantics have not emerged yet. This paper tackles this problem and analyzes the execution semantics of some widely adopted modern SPEs, namely Flink, Storm, Spark Streaming, Google Dataflow, and Azure Stream Analytics. We specifically target the notions of windowing and time, traditionally considered the key distinguishing factors that characterize the behavior of SPEs. We rely on the SECRET model, introduced in 2010 to analyze the windowing semantics for the SPEs available at that time. We show that SECRET models well some aspects of the behavior of modern SPEs, and we shed light on the evolution of SPEs after the introduction of SECRET by analyzing the elements that SECRET cannot fully capture. In this way, the paper contributes to the research in the area of stream processing by: (1) contrasting and comparing some widely used modern SPEs based on a formal model of their execution semantics; (2) discussing the evolution of SPEs since the introduction of the SECRET model; (3) suggesting promising research directions to direct further modeling efforts.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords