Forged facial video detection framework based on multi-region temporal relationship feature

Xing Fang; YanNi Hao; Yin Luo; Nan Xu; Jia Cao

doi:10.1063/5.0125032

AIP Advances (Aug 2023)

Forged facial video detection framework based on multi-region temporal relationship feature

Xing Fang,
YanNi Hao,
Yin Luo,
Nan Xu,
Jia Cao

Affiliations

Xing Fang: School of Management, Tianjin University, Tianjin 300072, China
YanNi Hao: Beijing Wenge Technology Co., Ltd., Beijing 100000, China
Yin Luo: Beijing Wenge Technology Co., Ltd., Beijing 100000, China
Nan Xu: Beijing Wenge Technology Co., Ltd., Beijing 100000, China
Jia Cao: Beijing Wenge Technology Co., Ltd., Beijing 100000, China

DOI: https://doi.org/10.1063/5.0125032
Journal volume & issue: Vol. 13, no. 8
pp. 085026 – 085026-8

Abstract

Read online

Face generation and manipulation techniques based on deep learning have enabled the creation of sophisticated forged facial videos, which are indistinguishable by human eyes. However, the illegal use of deep fake technology will have a serious impact on social stability, personal reputation, and even national security. Therefore, the detection technology of fake facial videos is of great significance to protect national security and maintain social order. Although the existing video-based fake face video detection technology has achieved good detection performance on the public fake face video database, there are still the following problems: (1) the existing technology uses a 2D attention mechanism to obtain local region features from face images and lacks a 3D attention mechanism to obtain local area features from face videos; (2) after obtaining local area features, the existing technology is directly used to classify or only model the inter-regional relationship of images without modeling the temporal relationship between regions of the video. This paper proposes a fake facial video detection framework based on multi-region temporal relationship features, including designing a three-dimensional attention mechanism to extract local features of multiple regions of the face from the video. In order to model the time series relationship between different face areas, a time series graph convolution neural network is also introduced to extract the time series relationship features between multiple areas. In order to model the time sequence relationship between different face regions, the convolution neural network of a time sequence diagram is also introduced to extract the characteristics of the time sequence relationship between multiple regions. Through the change characteristics of the time sequence relationship between face regions, the timing inconsistency of the face video is detected so as to determine whether the face has been deeply forged. Through experiments on multiple datasets, the experimental results of the model accuracy test show that the method proposed in the present invention achieves the highest detection accuracy, and the accuracy in the FaceForensics++ (low definition) dataset is 18.19% higher than that of the benchmark method. The experimental results of the generalization ability test show that the method proposed in the present invention achieves the highest generalization performance, and the detection accuracy in the Celeb-DF dataset is 11.92% higher than that of the benchmark method.

Published in AIP Advances

ISSN: 2158-3226 (Online)
Publisher: AIP Publishing LLC
Country of publisher: United States
LCC subjects: Science: Physics
Website: http://aipadvances.aip.org/

About the journal