Learning Self-distilled Features for Facial Deepfake Detection Using Visual Foundation Models: General Results and Demographic Analysis

Yan Martins Braz Gurevitz Cunha; Bruno Rocha Gomes; José Matheus C. Boaro; Daniel de Sousa Moraes; Antonio José Grandson Busson; Julio Cesar Duarte; Sérgio Colcher

doi:10.5753/jis.2024.4120

Journal on Interactive Systems (Jul 2024)

Learning Self-distilled Features for Facial Deepfake Detection Using Visual Foundation Models: General Results and Demographic Analysis

Yan Martins Braz Gurevitz Cunha,
Bruno Rocha Gomes,
José Matheus C. Boaro,
Daniel de Sousa Moraes,
Antonio José Grandson Busson,
Julio Cesar Duarte,
Sérgio Colcher

Affiliations

Yan Martins Braz Gurevitz Cunha: Telemidia Lab. – Pontifical Catholic University of Rio de Janeiro
Bruno Rocha Gomes: Telemidia Lab. – Pontifical Catholic University of Rio de Janeiro
José Matheus C. Boaro: Telemidia Lab. – Pontifical Catholic University of Rio de Janeiro
Daniel de Sousa Moraes: Telemidia Lab. – Pontifical Catholic University of Rio de Janeiro
Antonio José Grandson Busson: BTG Pactual
Julio Cesar Duarte: Military Institute of Engineering
Sérgio Colcher: Telemidia Lab. – Pontifical Catholic University of Rio de Janeiro

DOI: https://doi.org/10.5753/jis.2024.4120
Journal volume & issue: Vol. 15, no. 1

Abstract

Read online

Modern deepfake techniques produce highly realistic false media content with the potential for spreading harmful information, including fake news and incitements to violence. Deepfake detection methods aim to identify and counteract such content by employing machine learning algorithms, focusing mainly on detecting the presence of manipulation using spatial and temporal features. These methods often utilize Foundation Models trained on extensive unlabeled data through self-supervised approaches. This work extends previous research on deepfake detection, focusing on the effectiveness of these models while also considering biases, particularly concerning age, gender, and ethnicity, for ethical analysis. Experiments with DINOv2, a novel Vision Transformer-based Foundation Model, trained using the diverse Deepfake Detection Challenge Dataset, which encompasses several lighting conditions, resolutions, and demographic attributes, demonstrated improved deepfake detection when combined with a CNN classifier, with minimal bias towards these demographic characteristics.

Published in Journal on Interactive Systems

ISSN: 2763-7719 (Online)
Publisher: Brazilian Computer Society
Country of publisher: Brazil
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://sol.sbc.org.br/journals/index.php/jis/

About the journal

Abstract

Keywords