大数据 (Jan 2025)
Query scheduling based on cloud-edge multi-data warehouse architecture and cost prediction model
Abstract
With the development of cloud computing and big data, traditional local data warehouses are difficult to expand and have low data processing efficiency, As a result, the data warehouse of cloud edge architecture comes into being. The architectures data warehouses are distributed in the cloud center and at the edge, making data storage and processing more flexible and supporting business such as data security, data privacy and cross-geographic data sharing while ensuring query efficiency. This paper designed a scheduling framework based on cloud edge multi-data warehouses, integrated the query cost prediction model with machine learning technology as the core, and realized cloud edge collaborative execution and cloud edge selective execution on multiple query granularity, so as to improve the performance and query efficiency of the whole system. In addition, the multi-feature fusion and feature selection method is proposed to enhance the query cost information. The scheduling framework and optimization algorithm achieve significant performance improvement on SSB and TPC-DS datasets. It provides an effective solution for the query scheduling of data warehouse under the cloud edge multi-data warehouses architecture.