IEEE Access (Jan 2020)
Joint Online Coflow Optimization Across Geo-Distributed Datacenters
Abstract
Growing data volumes have been generated, stored and processed across geographically distributed datacenters. Processing such geo-distributed datasets should consider task or job placement, heterogeneous available link bandwidth, and routing decisions. Thus optimizing the transmissions of inter-datacenter flows, especially coflows that capture application-level semantics, is important. However, prior solutions on coflow scheduling either merely ignore routing or fix the endpoint placement. This article focuses on the problem of jointly considering endpoint placement, coflow scheduling, and coflow routing to minimize the average coflow completion time (CCT) across geo-distributed datacenters. This paper first presents the system model and problem formulation, then derives a randomized approximation algorithm for a single coflow. After that it also proposes an online algorithm to handle the complex cases of multiple coflows. Through theoretical analysis, a proof that the proposed algorithms have a non-trivial competitive ratio is given. Results from extensive simulations demonstrate that the proposed algorithms can significantly reduce the CCT of coflows and at the same time, have similar algorithm run time, compared with the state-of-the-art solutions.
Keywords