大数据 (Jan 2025)

Research on graph-based heterogeneous data integration method

  • HUANG Yuezhen,
  • YANG Fen,
  • TIAN Feng,
  • ZHANG Chengye,
  • LI Yuchan

Journal volume & issue
Vol. 11
pp. 21 – 35

Abstract

Read online

Various departments of the enterprise implement decentralized management of data, and the chimney-style system construction causes data to be scattered in heterogeneous databases. Heterogeneous data poses a series of challenges to the current data integration work. In order to solve the problem of data aggregation and fusion of enterprise heterogeneous systems, an end-to-end data integration framework based on graph was proposed. Firstly, the table and field entity relationships were constructed into a network graph based on the primary and foreign key relationships of the relational data model. The table names and field names were regarded as different types of entities in the graph. Then, input the constructed graph into the graph neural network, and the vector representation of each node in the graph was obtained through graph convolution. Based on the node vectors, the node mapping relationship of any two graphs that need to be matched can be calculated. After aligning the tables and fields of the graph, the next step was to standardize the field values, meaning that the value of each cell was mapped to a standard value. Finally, engineer the above results into executable query statements for the database to achieve heterogeneous data fusion. Through verification on real data within the enterprise, the experimental results show that the framework proposed in the paper can improve the development efficiency of data integration, and the model is not limited by business fields and has strong portability.

Keywords