Kongzhi Yu Xinxi Jishu (Dec 2022)

A Method of Performance Optimization for Distributed Stream Data Processing in Vehicle Data Peak Scenario

  • TANG Pengfei,
  • HU Weimin,
  • YANG Yongtao

DOI
https://doi.org/10.13889/j.issn.2096-5427.2022.06.014
Journal volume & issue
no. 6
pp. 91 – 98

Abstract

Read online

In the field of rail transit, vehicle intelligent operation and maintenance system uses various big data technologies to realize real-time calculation and analysis of vehicle status data. With status monitoring and fault warning, the system guides the maintenance of key equipments, and effectively improves the work efficiency of vehicle operation and maintenance personnel. However, data flood occurs due to the instability of network communication environment between vehicle and ground system, resulting in performance bottleneck when big data computing framework uses the general configuration. This paper makes an in-depth analysis on the causes of the performance bottleneck, and proposes an optimization method based on the user-defined Kafka partition strategy and the optimization of Spark Streaming processing parameters. Data of different vehicles is written to different Kafka partitions, and the rate at which the Spark Executor reads data from Kafka is controlled. Actual project application results show that this method can effectively solve the problem of performance bottleneck when the processing rate of streaming data cannot keep up with the reading rate in the data flood scenario.

Keywords