Multi-attention-based approach for deepfake face and expression swap detection and localization

Saima Waseem; Syed Abdul Rahman Syed Abu-Bakar; Zaid Omar; Bilal Ashfaq Ahmed; Saba Baloch; Adel Hafeezallah

doi:10.1186/s13640-023-00614-z

EURASIP Journal on Image and Video Processing (Aug 2023)

Multi-attention-based approach for deepfake face and expression swap detection and localization

Saima Waseem,
Syed Abdul Rahman Syed Abu-Bakar,
Zaid Omar,
Bilal Ashfaq Ahmed,
Saba Baloch,
Adel Hafeezallah

Affiliations

Saima Waseem: Faculty of Electrical Engineering, Universiti Teknologi Malaysia
Syed Abdul Rahman Syed Abu-Bakar: Faculty of Electrical Engineering, Universiti Teknologi Malaysia
Zaid Omar: Faculty of Electrical Engineering, Universiti Teknologi Malaysia
Bilal Ashfaq Ahmed: Faculty of Computing, Universiti Teknologi Malaysia
Saba Baloch: Faculty of Electrical Engineering, Universiti Teknologi Malaysia
Adel Hafeezallah: Department of Electrical Engineering, Taibah University

DOI: https://doi.org/10.1186/s13640-023-00614-z
Journal volume & issue: Vol. 2023, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Advancements in facial manipulation technology have resulted in highly realistic and indistinguishable face and expression swap videos. However, this has also raised concerns regarding the security risks associated with deepfakes. In the field of multimedia forensics, the detection and precise localization of image forgery has become essential tasks. Current deepfake detectors perform well with high-quality faces within specific datasets, but often struggle to maintain their performance when evaluated across different datasets. To this end, we propose an attention-based multi-task approach to improve feature maps for classification and localization tasks. The encoder and the attention-based decoder of our network generate localized maps that highlight regions with information about the type of manipulation. These localized features are shared with the classification network, improving its performance. Instead of using encoded spatial features, attention-based localized features from the decoder’s first layer are combined with frequency domain features to create a discriminative representation for deepfake detection. Through extensive experiments on face and expression swap datasets, we demonstrate that our method achieves competitive performance in comparison to state-of-the-art deepfake detection approaches in both in-dataset and cross-dataset scenarios. Code is available at https://github.com/saimawaseem/Multi-Attention-Based-Approach-for-Deepfake-Face-and-Expression-Swap-Detection-and-Localization .

Published in EURASIP Journal on Image and Video Processing

ISSN: 1687-5176 (Print); 1687-5281 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics
Website: https://jivp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords