IEEE Access (Jan 2024)

QHB<sup>+</sup>: Accelerated Configuration Optimization for Automated Performance Tuning of Spark SQL Applications

  • Deokyeon Jang,
  • Hyunsik Yoon,
  • Kijung Jung,
  • Yon Dohn Chung

DOI
https://doi.org/10.1109/ACCESS.2024.3391333
Journal volume & issue
Vol. 12
pp. 60138 – 60148

Abstract

Read online

Apache Spark stands out as a well-known solution for big data processing because of its efficiency and rapid processing capabilities. One of its modules, Spark SQL, serves as a prominent big data query engine. However, executing Spark SQL applications with massive data can be time-intensive, and the execution time can vary significantly depending on its configurations. Recent studies try to reduce application execution times by searching optimal configurations for applications. While Bayesian optimization is recognized as a powerful method in recent studies for configuration optimization, it faces challenges such as computational costs and time-consuming computations, especially when dealing with large search spaces Due to these challenges, we propose QHB+, designed to rapidly search optimal configurations. QHB+ utilizes the Successive Halving Algorithm-based optimization methods, performing well in hyperparameter optimization of machine learning models, for configuration optimization of Spark SQL applications. Through empirical evaluations against established benchmarks, we show the efficiency of QHB+, highlighting them as swift alternatives to conventional optimization method for optimizing Spark SQL configurations.

Keywords