大数据 (Nov 2024)

Semantic-based robust text watermarking algorithm

  • ZHANG Kun,
  • LI Bo,
  • CHEN Xi,
  • YANG Xiaoyi,
  • WU Le,
  • HONG Richang

Journal volume & issue
Vol. 10
pp. 49 – 61

Abstract

Read online

Text watermarking can determine the copyright ownership of text data, facilitating secure circulation and sharing of data. Existing text watermarking algorithms typically pre-mark words and employ word substitution methods to embed watermarks. However, these algorithms only mark candidate words based on the hash value of the previous word, limiting the robustness of the watermarking algorithm. To address this issue, SRTW algorithm was proposed. Specifically, semantic embeddings of the text were obtained using existing embedding models. Then, these embeddings were converted into word markers (-1 or 1) through a trained word marking model. Finally, words marked as 1 were selected to replace the original words to construct the watermark. Compared with existing more advanced benchmark algorithms, the proposed SRTW algorithm improves the AUC metric by 2.08%, 5.17%, and 3.09% in three different attack scenarios, respectively, demonstrating the effectiveness of the SRTW algorithm.

Keywords