IEEE Access (Jan 2021)
Digital Watermarking for Anonymized Data With Low Information Loss
Abstract
At present, massive amounts of data are utilized for artificial intelligence technologies such as machine learning and deep learning. However, these data must be utilized carefully while preserving data privacy. Data anonymization is a technique enabling both data mining and privacy protection, preventing the identification of individuals by generalizing the data to include multiple records with the same values. In this study, we consider a data-publishing infrastructure for personal data sharing. The infrastructure anonymizes data prior to publishing it to users for privacy protection; however, the problem of unauthorized republishing by malicious users must be considered. To address this issue, we studied digital watermarking methods that correlate data users with anonymized data. Our previous method embedded information indicating the original user to detect illegally republished data. However, this method did not focus on information loss. This study proposes another digital watermarking method for anonymized data that achieves low information loss. The proposed method replaces values in tuples to embed information. To reduce the information loss caused by the embedding, the proposed method selects replacement values from the candidates whose meanings are similar to the original. We propose the use of vector-conversion tables to select replacement values. The proposed method also extends the maximum length of the embedded bit string by embedding multiple bits into a single tuple. Moreover, we measured the tolerance to distortion attacks to evaluate the efficacy of the proposed method. The proposed method is non-blind, i.e., data prior to digital watermarking is required to perform extraction.
Keywords