IEEE Access (Jan 2024)
Trends, Approaches, and Gaps in Scientific Workflow Scheduling: A Systematic Review
Abstract
This systematic review offers a comprehensive analysis of scheduling algorithms designed for scientific workflows, particularly those handling Big Data. By examining research published between 2010 and 2023, we aim to provide a structured overview of the field, highlighting key trends and identifying existing gaps. This review will be valuable to researchers and practitioners in computer science who are involved in developing workflow management systems, deploying, and managing complex computational workflows that process Big Data across various scientific domains, including astronomy, bioinformatics and genomics, climate science, earth and environmental sciences, high-energy physics, and social sciences. These workflows are frequently executed on High-performance, Cloud, Fog, Grid, IoT, and hybrid computing systems. We focus on scheduling algorithms tailored for batch and stream processing workflows, cataloging them based on their optimization objectives (e.g., monetary cost, makespan, resource efficiency, energy consumption), optimization techniques (e.g., heuristics, meta-heuristics, machine learning), and assessment criteria (e.g., workflow complexity, computing environment scale, dataset size). Our analysis reveals that cloud computing remains the dominant environment for scientific workflow scheduling, but significant gaps persist in addressing stream processing workflows, resource efficiency, energy consumption, data movement, and applications in High-performance computing, IoT, and Hybrid systems. Moreover, the potential of machine learning techniques for scheduling algorithms remains largely untapped. By providing a comprehensive catalog of existing scheduling algorithms, this review aims to support future research and practical implementations in the field of scientific workflow scheduling.
Keywords