Data pipeline approaches in serverless computing: a taxonomy, review, and research trends

Zahra Shojaee Rad; Mostafa Ghobaei-Arani

doi:10.1186/s40537-024-00939-0

Journal of Big Data (Jun 2024)

Data pipeline approaches in serverless computing: a taxonomy, review, and research trends

Zahra Shojaee Rad,
Mostafa Ghobaei-Arani

Affiliations

Zahra Shojaee Rad: Department of Computer Engineering, Qom Branch, Islamic Azad University
Mostafa Ghobaei-Arani: Department of Computer Engineering, Qom Branch, Islamic Azad University

DOI: https://doi.org/10.1186/s40537-024-00939-0
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 42

Abstract

Read online

Abstract Serverless computing has gained significant popularity due to its scalability, cost-effectiveness, and ease of deployment. With the exponential growth of data, organizations face the challenge of efficiently processing and analyzing vast amounts of data in a serverless environment. Data pipelines play a crucial role in managing and transforming data within serverless architectures. This paper provides a taxonomy of data pipeline approaches in serverless computing. Classification is based on architectural features, data processing techniques, and workflow orchestration mechanisms, these approaches are categorized into three primary methods: heuristic-based approach, Machine learning-based approach, and framework-based approach. Furthermore, a systematic review of existing data pipeline frameworks and tools is provided, encompassing their strengths, limitations, and real-world use cases. The advantages and disadvantages of each approach, also the challenges and performance metrics that influence their effectuality have been examined. Every data pipeline approach has certain advantages and disadvantages, whether it is framework-based, heuristic-based, or machine learning-based. Each approach is suitable for specific use cases. Hence, it is crucial assess the trade-offs between complexity, performance, cost, and scalability, while selecting a data pipeline approach. In the end, the paper highlights a number of open issues and future investigations directions for data pipeline in the serverless computing, which involve scalability, fault tolerance, data real time processing, data workflow orchestration, function state management with performance and cost in the serverless computing environments.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords