Learning spatial‐frequency interaction for generalizable deepfake detection

Tianbo Zhai; Kaiyin Lu; Jiajun Li; Yukai Wang; Wenjie Zhang; Peipeng Yu; Zhihua Xia

doi:10.1049/ipr2.13276

IET Image Processing (Dec 2024)

Learning spatial‐frequency interaction for generalizable deepfake detection

Tianbo Zhai,
Kaiyin Lu,
Jiajun Li,
Yukai Wang,
Wenjie Zhang,
Peipeng Yu,
Zhihua Xia

Affiliations

Tianbo Zhai: College of Cyber Security, Engineering Research Center of Trustworthy AI, Ministry of EducationJinan UniversityGuangzhouChina
Kaiyin Lu: College of Cyber Security, Engineering Research Center of Trustworthy AI, Ministry of EducationJinan UniversityGuangzhouChina
Jiajun Li: College of Cyber Security, Engineering Research Center of Trustworthy AI, Ministry of EducationJinan UniversityGuangzhouChina
Yukai Wang: College of Cyber Security, Engineering Research Center of Trustworthy AI, Ministry of EducationJinan UniversityGuangzhouChina
Wenjie Zhang: College of Information Science and EngineeringNingbo UniversityNingboChina
Peipeng Yu: College of Cyber Security, Engineering Research Center of Trustworthy AI, Ministry of EducationJinan UniversityGuangzhouChina
Zhihua Xia: College of Cyber Security, Engineering Research Center of Trustworthy AI, Ministry of EducationJinan UniversityGuangzhouChina

DOI: https://doi.org/10.1049/ipr2.13276
Journal volume & issue: Vol. 18, no. 14
pp. 4666 – 4679

Abstract

Read online

Abstract In recent years, face forgery detection has gained significant attention, resulting in considerable advancements. However, most existing methods rely on CNNs to extract artefacts from the spatial domain, overlooking the pervasive frequency‐domain artefacts present in deepfake content, which poses challenges in achieving robust and generalized detection. To address these issues, we propose the dual‐stream frequency—spatial fusion network is proposed for deepfake detection. The dual‐stream frequency‐spatial fusion network consists of three components: the spatial forgery feature extraction module, the frequency forgery feature extraction module, and the spatial–frequency feature fusion module. The spatial forgery feature extraction module employs spatial‐channel attention to extract spatial domain features, targeting artefacts in the spatial domain. The frequency forgery feature extraction module leverages the focused linear attention to detect frequency domain anomalies in internal regions, enabling the identification of generated content. The spatial–frequency feature fusion module then fuses forgery features extracted from both the spatial and frequency domains, facilitating accurate detection of splicing artefacts and internally generated forgeries. This approach enhances the model's ability to more accurately capture forgery characteristics. Extensive experiments on several widely‐used benchmarks demonstrate that our carefully designed network exhibits superior generalization and robustness, significantly improving deepfake detection performance.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords