Joint Scheduling of Tasks and Network Flows in Big Data Clusters

Lei Yang; Xuxun Liu; Jiannong Cao; Zhenyu Wang

doi:10.1109/ACCESS.2018.2878864

IEEE Access (Jan 2018)

Joint Scheduling of Tasks and Network Flows in Big Data Clusters

Lei Yang,
Xuxun Liu,
Jiannong Cao,
Zhenyu Wang

Affiliations

Lei Yang: ORCiD; School of Software Engineering, South China University of Technology, Guangzhou, China
Xuxun Liu: ORCiD; School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
Jiannong Cao: Department of Computing, The Hong Kong Polytechnic University, Hong Kong
Zhenyu Wang: School of Software Engineering, South China University of Technology, Guangzhou, China

DOI: https://doi.org/10.1109/ACCESS.2018.2878864
Journal volume & issue: Vol. 6
pp. 66600 – 66611

Abstract

Read online

As an increasing number of big data processing platforms like Hadoop, Spark, and Storm appear and normally share the resources in the data center, it has been important and challenging to schedule various jobs from these platforms onto the underlying data center resources such that the overall job completion time is minimized. To solve the problem, the existing work either focus on the task-level scheduling techniques, such as Quincy and delay scheduling, or focus on the network flow scheduling techniques, such as D3 and preemptive distributed quick. These works deal with the scheduling of tasks and network flows separately and cannot achieve optimal performance. The reason is that the task scheduling without regard of the available network bandwidths may generate the task placement that causes serious network congestions and thus leads to long data transmission time. In this paper, we propose the joint scheduling technique by coordinating the task placement and the scheduling of network flows arising from these tasks. We develop a software-defined network (SDN)-based online scheduling framework which selects the task placement based on the available bandwidth on the SDN switches and at meanwhile optimally allocates the bandwidth to each data flow. Comprehensive trace-driven simulations show that the joint scheduling technique can take full use of the network bandwidth and thus reduce the job completion time by 55% on average compared with the benchmark methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords