A novel model compression method based on joint distillation for deepfake video detection

Xiong Xu; Shuai Tang; Mingcheng Zhu; Peisong He; Sirui Li; Yun Cao

Journal of King Saud University: Computer and Information Sciences (Oct 2023)

A novel model compression method based on joint distillation for deepfake video detection

Xiong Xu,
Shuai Tang,
Mingcheng Zhu,
Peisong He,
Sirui Li,
Yun Cao

Affiliations

Xiong Xu: Southwest China Institute of Electronic Technology, Chengdu 610036, China
Shuai Tang: School of Cyber Science and Engineering, Sichuan University, Chengdu 610207, China
Mingcheng Zhu: Department of Computing, Imperial College London, London SW72AZ, United Kingdom
Peisong He: School of Cyber Science and Engineering, Sichuan University, Chengdu 610207, China; Corresponding author.
Sirui Li: Pittsburgh Institute, Sichuan University, Chengdu 610207, China
Yun Cao: State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085, China

Journal volume & issue: Vol. 35, no. 9
p. 101792

Abstract

Read online

In recent years, deepfake videos have been abused to create fake news, which threaten the integrity of digital videos. Although existing detection methods leveraged cumbersome neural networks to achieve promising detection performance, they cannot be deployed in resource-constrained scenarios. To overcome this limitation, we propose a novel model compression framework based on joint distillation for deepfake detection, which includes a pre-training stage and a knowledge transfer stage. In the pre-training stage, a teacher network is trained with sufficient labeled samples. Then, in the knowledge transfer stage, a lightweight student network is constructed by considering dimension alignment. To transfer forensics knowledge comprehensively, a joint distillation loss is designed, including cross-entropy loss, knowledge distillation loss, and gradient-guided feature distillation loss. For feature distillation, feature maps from both shallow and deep layers are utilized to calculate channel-wise mean square error weighted by gradient information for transferring knowledge of forensics features adaptively. Besides, a decayed teaching strategy is constructed to adjust the importance of feature distillation, which aims at mitigating the risk of negative transfer. Extensive experiments show that student networks obtained by our model compression method can achieve competitive detection performance and outstanding efficiency by distinctly reducing computational costs compared against state-of-the-art methods.

Published in Journal of King Saud University: Computer and Information Sciences

ISSN: 1319-1578 (Print)
Publisher: Elsevier
Country of publisher: Saudi Arabia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://link.springer.com/journal/44443

About the journal

Abstract

Keywords