IEEE Access (Jan 2021)

A Data-Aware Scheduling Strategy for Executing Large-Scale Distributed Workflows

  • Salvatore Giampa,
  • Loris Belcastro,
  • Fabrizio Marozzo,
  • Domenico Talia,
  • Paolo Trunfio

DOI
https://doi.org/10.1109/ACCESS.2021.3067815
Journal volume & issue
Vol. 9
pp. 47354 – 47364

Abstract

Read online

Task scheduling is a crucial key component for the efficient execution of data-intensive applications on distributed environments, by which many machines must be coordinated to reduce execution times and bandwidth consumption. This paper presents ADAGE, a data-aware scheduler designed to efficiently execute data-intensive workflows in large-scale computers. The proposed scheduler is based on three key features: $i$ ) critical path analysis, for discovering the critical tasks of a workflow and reducing data transferring between nodes; $ii$ ) work giving, a new dynamic planning strategy for migrating tasks from overloaded to unloaded nodes; and $iii$ ) task replication, which executes task replicas on different nodes for improving both execution time and fault tolerance. Experiments performed on a distributed computing environment composed of up to 1,024 processing nodes show that ADAGE achieves better performances than existing scheduling systems, obtaining an average reduction of up to 66% in execution time.

Keywords