Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

Kassiano J. Matteussi; Julio C. S. dos Anjos; Valderi R. Q. Leithardt; Claudio F. R. Geyer

doi:10.3390/s22134756

Sensors (Jun 2022)

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

Kassiano J. Matteussi,
Julio C. S. dos Anjos,
Valderi R. Q. Leithardt,
Claudio F. R. Geyer

Affiliations

Kassiano J. Matteussi: Institute of Informatics, Federal University of Rio Grande do Sul, UFRGS/PPGC, Porto Alegre 91501-970, RS, Brazil
Julio C. S. dos Anjos: Graduate Program in Teleinformatics Engineering Federal, University of Ceará, PPGETI/UFC, Center of Technology, Campus of Pici, Fortaleza 60455-970, CE, Brazil
Valderi R. Q. Leithardt: COPELABS, Universidade Lusófona de Humanidades e Tecnologias, 1749-024 Lisboa, Portugal
Claudio F. R. Geyer: Institute of Informatics, Federal University of Rio Grande do Sul, UFRGS/PPGC, Porto Alegre 91501-970, RS, Brazil

DOI: https://doi.org/10.3390/s22134756
Journal volume & issue: Vol. 22, no. 13
p. 4756

Abstract

Read online

A significant rise in the adoption of streaming applications has changed the decision-making processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related in-memory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords