IEEE Access (Jan 2019)
Cluster-Scheduling Big Graph Traversal Task for Parallel Processing in Heterogeneous Cloud Based on DAG Transformation
Abstract
Task scheduling is the key to the full utilization of heterogeneous cloud capabilities for parallel processing of big graphs. Most graph processing systems adopt single-granularity scheduling mechanisms without considering the heterogeneity of the cloud, leading to poor performance. To alleviate it by learning from the excellent directed acyclic graph (DAG)-based scheduling techniques accumulated in traditional parallel computing, we first present a streaming DAG-construction heuristic. It transforms a big graph along with graph traversal algorithms to be carried out into a DAG. We then propose a three-phase heterogeneous-aware cluster-scheduling algorithm to schedule the DAG into a heterogeneous cloud for parallel processing. In the first phase, we design a parallel linear clustering algorithm to cluster the DAG into a series of linear clusters with different granularities. In the second phase, we design a heterogeneous-aware load balancing algorithm to map these clusters to different computational nodes of the cloud. In the last phase, we design a task ordering algorithm to assigns these clusters as-early-as-possible start times. The experimental results show that our scheme can generate high-quality schedules and improve the efficiency and performance of parallel processing of big graphs in the heterogeneous cloud.
Keywords