Jisuanji kexue yu tansuo (Oct 2021)
Research on Increased Data Repair with Confidence Value Token
Abstract
In the era of big data, data contain great value and become important strategic resource in today??s information society. However, a large number of inconsistent data occur during the process of data update and management, which causes unpredictable side effects for enterprises. There are three repair methods based on functional dependencies. The first two methods strongly rely on the Master data or confidence value of given tuples provided by enterprises, which are hard to fulfill in real application. And the third kind of repair method based on the minimal deletion principle will cause the loss of information. Moreover, when solving the conflicts of [X→Y], existing methods only support modifying Y attribute. In view of the shortcomings mentioned above, with the situation of missing tuple confidence, this paper proposes an increased data repair with confidence value token, which can be divided into two parts: the first part is to generate confidence value token automatically by analyzing operator log and knowledge rules, and the second part includes an increased repair strategy which can determine the repair of X or Y attributes according to the confidence value token. Meanwhile, the target value is chosen to repair dirty data with the combination of conditional probability. Experimental results show that the proposed method has high reliability and scalability.
Keywords