Mathematical Biosciences and Engineering (Jun 2023)

An innovative parameter optimization of Spark Streaming based on D3QN with Gaussian process regression

  • Hong Zhang,
  • Zhenchao Xu,
  • Yunxiang Wang ,
  • Yupeng Shen

DOI
https://doi.org/10.3934/mbe.2023647
Journal volume & issue
Vol. 20, no. 8
pp. 14464 – 14486

Abstract

Read online

Nowadays, Spark Streaming, a computing framework based on Spark, is widely used to process streaming data such as social media data, IoT sensor data or web logs. Due to the extensive utilization of streaming media data analysis, performance optimization for Spark Streaming has gradually developed into a popular research topic. Several methods for enhancing Spark Streaming's performance include task scheduling, resource allocation and data skew optimization, which primarily focus on how to manually tune the parameter configuration. However, it is indeed very challenging and inefficient to adjust more than 200 parameters by means of continuous debugging. In this paper, we propose an improved dueling double deep Q-network (DQN) technique for parameter tuning, which can significantly improve the performance of Spark Streaming. This approach fuses reinforcement learning and Gaussian process regression to cut down on the number of iterations and speed convergence dramatically. The experimental results demonstrate that the performance of the dueling double DQN method with Gaussian process regression can be enhanced by up to 30.24%.

Keywords