Jisuanji kexue (Mar 2022)
Reducing Head-of-Line Blocking on Network in Hadoop Clusters
Abstract
Users of big data analytics systems want the execution time of tasks to be as short as possible.However,during task execution,both network and computational moments may become resource bottlenecks that hinder task execution.Through the observation and analysis of the big data analysis system,the following conclusions are drawn:1)the data-parallel framework should switch between multiple working modes depending on the current resource bottlenecks;2)the scheduling of subtasks should fully consider the new tasks that may arrive in the future,not only the currently submitted tasks.Based on the above observations,a new task scheduling system Duopoly is designed and implemented,which consists of two parts:cans,a network scheduler that senses computational resources,and nats,a sub-task scheduler that senses network resources.The effectiveness of Duopoly is evaluated by small-scale physical clusters and large-scale simulation experiments,and the experimental results show that Duopoly can reduce the average task completion time by 37.30%~76.16% compared with existing work.
Keywords