IEEE Access (Jan 2022)
The Performance Optimization of Big Data Processing by Adaptive MapReduce Workflow
Abstract
The discussion context of this paper is big data processing of MapReduce by volunteer computing in dynamic and opportunistic environments. This paper conducts a series of simulations to explore the relationship between the overall performance of volunteer overlays responding to different workload of big data problems. The discovery from the simulations includes some optimization points in overlay size, going over which by adding more volunteers brings little benefit for the overall performance. Based on the discovery of optimization points, this paper proposes a bootstrapping protocol, which can adapt volunteers into variable-sizes overlays, enabling workflow of single-round MapReduce or multiple-round MapReduce, and a single or multiple overlays for each round. The variable overlays aim to create adaptive workflow during MapReduce processing, so that the optimization points can be caught. As another benefit, the unnecessary computing-capacities can be released during computing when the optimization points are reached. The case study shows a few optimization workflows that are formed by the proposed bootstrapping protocol to process the big data cases. The workflows lead to the optimization points and dynamically balance the workload at the same time. The experiment results have demonstrated that the optimization strategies have either achieved 36% or 71% higher performance than the plain MapReduce workflow and minimized the use of computing resources by releasing 12.5% to 75% volunteers during computing, where the original plain MapReduce must hold all the volunteers to the end of computing. The extensibility of the simulation parameterization to more diverse real-world applications have been clarified.
Keywords