Egyptian Informatics Journal (Sep 2024)
Deepfake detection: Enhancing performance with spatiotemporal texture and deep learning feature fusion
Abstract
Deepfakes bring critical ethical issues about consent, authenticity, and the manipulation of digital content. Identifying Deepfake videos is one step towards fighting their malicious uses. While the previous works introduced accurate methods for Deepfake detection, the stability of the proposed methods is rarely discussed. The problem statement of this paper is to build a stable model for Deepfake detection. The results of the model should be reproducible. In other words, if other researchers repeat the same experiments, the results should not differ. The proposed technique combines multiple spatiotemporal textures and deep learning-based features. An enhanced 3D Convolutional Neural Network, which contains a spatiotemporal attention layer, is utilized in a Siamese architecture. Various analyses are carried out on the control parameters, feature importance, and reproducibility of results. Our technique is tested on four datasets: Celeb-DF, FaceForensics++, DeepfakeTIMIT, and FaceShifter. The results demonstrate that a Siamese architecture can improve the accuracy of 3D Convolutional Neural Networks by 7.9 % and reduce the standard deviation of accuracy to 0.016, which indicates reproducible results. Furthermore, adding texture features enhances accuracy by up to 91.96 %. The final model can achieve an Area Under Curve (AUC) up to 97.51 % and 95.44 % in same-dataset and cross-dataset scenarios, respectively. The main contributions of this work are the enhancement of model stability and the assurance of result repeatability, ensuring consistent results with high accuracy.