IEEE Access (Jan 2024)

Coexistence of Deepfake Defenses: Addressing the Poisoning Challenge

  • Jaewoo Park,
  • Leo Hyun Park,
  • Hong Eun Ahn,
  • Taekyoung Kwon

DOI
https://doi.org/10.1109/ACCESS.2024.3353785
Journal volume & issue
Vol. 12
pp. 11674 – 11687

Abstract

Read online

As Generative Adversarial Networks advance, deepfakes have become increasingly realistic, thereby escalating societal, economic, and political threats. In confronting these heightened risks, the research community has identified two promising defensive strategies: proactive deepfake disruption and reactive deepfake detection. Typically, proactive and reactive defenses coexist, each addressing the shortcomings of the other. However, this paper brings to the fore a critical yet overlooked issue associated with the simultaneous deployment of these deepfake countermeasures. Genuine images gathered from the Internet, already imbued with disrupting perturbations, can lead to data poisoning in the training datasets of deepfake detection models, thereby severely affecting detection accuracy. We propose an improved training framework to address this problem in deepfake detection models. Our approach involves purifying the disrupting perturbations in disruptive images using a backward process of the denoising diffusion probabilistic model (DDPM). Images purified using our DDPM-based technique closely mimic the original, unperturbed images, thereby enabling the successful generation of deepfake images for training purposes. Moreover, our purification process outperforms DiffPure, a prominent adversarial purification method, in terms of speed. While conventional defensive techniques struggle to preserve detection accuracy in the face of a poisoned training dataset, our framework markedly reduces this accuracy drop, thus achieving superior performance across a range of detection models. Our experiments demonstrate that deepfake detection models trained using our framework exhibit an increase in detection accuracy ranging from 11.24%p to 45.72%p when compared to models trained with the DiffPure method. Our implementation is available at https://github.com/seclab-yonsei/Anti-disrupt.

Keywords