IEEE Access (Jan 2018)

Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization

  • Xingcheng Hua,
  • Michael C. Huang,
  • Peng Liu

DOI
https://doi.org/10.1109/ACCESS.2018.2857852
Journal volume & issue
Vol. 6
pp. 44161 – 44174

Abstract

Read online

MapReduce is a popular programming model for big data processing. Although the distributed processing framework Hadoop greatly reduced the development complexity of MapReduce applications, fine tuning of the Hadoop systems for optimal performance remains a major challenge. Configuration tuning is one of the most effective means to improve the performance of MapReduce applications on Hadoop systems, which invariably adopt the default configuration. However, the huge Hadoop configuration parameter space makes it impractical to explore the parameter combinations exhaustively. In this paper, we propose HTune, an effective Hadoop configuration tuning approach for MapReduce applications. We design a nonintrusive performance profiler whose runtime overhead remains less than 2%, to capture the runtime details of the MapReduce applications and generate their performance evaluations. Based on the performance profiles, a two-level fusion model is constructed based on ensemble modeling for each application in the execution predictor, considering both Hadoop configuration, and input data size. Leveraging the execution predictor, a metaheuristic-based configuration optimizer is able to search for the optimal configuration for a given application. Experimental results demonstrate that the optimal Hadoop configuration is often application-specific and data-specific, and it is more suitable to take all relevant configuration parameters into consideration and optimize them together. H-Tune improves the performance of the MapReduce applications by factors of 1.5× and 9.6× on average, respectively, over the state-of-the-art approach and the default configuration.

Keywords