Jisuanji kexue yu tansuo (Jan 2022)

Optimization Method of Projection and Order for Multiple Tables Join

  • ZONG Fengbo, ZHAO Yuhai, WANG Guoren, JI Hangxu

DOI
https://doi.org/10.3778/j.issn.1673-9418.2009099
Journal volume & issue
Vol. 16, no. 1
pp. 106 – 119

Abstract

Read online

Multiple tables join operation is a common operation in big data processing. Similar to the common Join operations in database operations, the order of multiple tables join operation will have a great impact on the consumption of computing resources and transmission resources. The optimization of the join order of multiple tables is a classical optimization problem, and the size of the projection result of the table in each join will also affect the data volume transmitted between nodes. Therefore, the overall connection order and the projection relationship of each connection will have a significant impact on the join efficiency. But in the traditional optimiza-tion strategy, the choice of intermediate projection relation, and the influence on the optimal join strategy based on the intermediate projection relation are often not considered. In order to solve this problem, this paper establishes a connection relation index, which can adjust the projection relation of each join in the construction optimization connection strategy, delete redundant columns in time, and reduce the consumption of transmission resources. At the same time, the optimization strategy of adjusting join order based on projection relation can reduce the consumption of transmission resources and computing resources as much as possible. After the implementation in the Flink system, the optimization strategy is tested, and the results show that it has a significant optimization effect.

Keywords