Jisuanji kexue (May 2022)

Method on Multi-granularity Data Provenance for Data Fusion

  • YANG Fei-fei, SHEN Si-yu, SHEN De-rong, NIE Tie-zheng, KOU Yue

DOI
https://doi.org/10.11896/jsjkx.210300092
Journal volume & issue
Vol. 49, no. 5
pp. 120 – 128

Abstract

Read online

As the amount of data increases,correlates and crosses between data,the value of data needs to be maximized through data fusion.However,due to the complexity of the data fusion process,to clearly explain the data fusion process,it is necessary to establish a backtracking mechanism for data fusion.Although many researches are focused on data provenance,most of them are based on query and workflow,and few of them are for data fusion.This paper focuses on the provenance of data fusion,and proposes a method to support multi-granularity provenance.Firstly,the data fusion process is abstracted,and the semantic graphs of patterns,entities and attributes are constructed with the entity as the core,and an optimized model for storing storage provenance information is proposed.Secondly,on the basis of the semantic graph,the data provenance query algorithms at the entity level and the attribute level are proposed respectively,and the corresponding query optimization strategy are also proposed.Finally,experiments demonstrate the effectiveness of the proposed data provenance method.

Keywords