IEEE Access (Jan 2018)
Optimal Task Scheduling for Distributed Cluster With Active Storage Devices and Accelerated Nodes
Abstract
With advancements in compute-intensive and memory-bound applications, the need for faster and more energy-efficient processing platforms continues. In support of these advancements, heterogeneous platforms have been proposed to enhance the performance and efficiency in the cloud. These platforms include field programmable gate arrays and graphical processing units in addition to general-purpose processors. Furthermore, there is a strong interest in advancing active solid-state drives to support both storage and computation. In this paper, we present a generic formulation to support the modeling of such a heterogeneous cloud environment, without being specific to a particular cloud platform such as Spark or Hadoop. We represent the cloud as a collection of clients, middleware control nodes, and high performance compute nodes (HPN), where the HPNs represent the options of advanced compute technologies in a heterogeneous cloud. The objective of the paper is to present a simple and efficient formulation for scheduling applications in such a heterogeneous cloud. Consistent with recent software modeling of artificial intelligence applications, we propose to map applications into directed acyclic graph representations of tasks. The optimization problem is then formulated to infer the best scheduling of tasks on the HPNs, while minimizing the overall execution time and data communication delays between nodes. Unlike existing scheduling algorithms that assume equal performance across nodes, our formulation explicitly takes into account the different compute capabilities of the heterogeneous nodes. The resulting task scheduling is then evaluated to provide insights into the performance gains with the proposed advanced heterogeneous cloud computing environment. The results show improved performance when comparing the proposed taskscheduling algorithm with the genetic algorithm and heterogeneous earliest finish time algorithms. We also show the performance gains achieved with the optimal task scheduling on a heterogeneous cloud system as compared with a conventional CPU-only cloud system.
Keywords