An Attention-Based Framework for Detecting Face Forgeries: Integrating Efficient-ViT and Wavelet Transform

Yinfei Xiao; Yanbing Zhou; Pengzhan Cheng; Leqian Ni; Xusheng Wu; Tianxiang Zheng

doi:10.3390/math13162576

Mathematics (Aug 2025)

An Attention-Based Framework for Detecting Face Forgeries: Integrating Efficient-ViT and Wavelet Transform

Yinfei Xiao,
Yanbing Zhou,
Pengzhan Cheng,
Leqian Ni,
Xusheng Wu,
Tianxiang Zheng

Affiliations

Yinfei Xiao: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Yanbing Zhou: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Pengzhan Cheng: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Leqian Ni: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Xusheng Wu: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Tianxiang Zheng: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China

DOI: https://doi.org/10.3390/math13162576
Journal volume & issue: Vol. 13, no. 16
p. 2576

Abstract

Read online

As face forgery techniques, particularly the DeepFake method, progress, the imperative for effective detection of manipulations that enable hyper-realistic facial representations to mitigate security threats is emphasized. Current spatial domain approaches commonly encounter difficulties in generalizing across various forgery methods and compression artifacts, whereas frequency-based analyses exhibit promise in identifying nuanced local cues; however, the absence of global contexts impedes the capacity of detection methods to improve generalization. This study introduces a hybrid architecture that integrates Efficient-ViT and multi-level wavelet transform to dynamically merge spatial and frequency features through a dynamic adaptive multi-branch attention (DAMA) mechanism, thereby improving the deep interaction between the two modalities. We innovatively devise a joint loss function and a training strategy to address the imbalanced data issue and improve the training process. Experimental results on the FaceForensics++ and Celeb-DF (V2) have validated the effectiveness of our approach, attaining 97.07% accuracy in intra-dataset evaluations and a 74.7% AUC score in cross-dataset assessments, surpassing our baseline Efficient-ViT by 14.1% and 7.7%, respectively. The findings indicate that our approach excels in generalization across various datasets and methodologies, while also effectively minimizing feature redundancy through an innovative orthogonal loss that regularizes the feature space, as evidenced by the ablation study and parameter analysis.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords