Electronic Research Archive (Mar 2024)
Analyzing temporal coherence for deepfake video detection
Abstract
Current facial image manipulation techniques have caused public concerns while achieving impressive quality. However, these techniques are mostly bound to a single frame for synthesized videos and pay little attention to the most discriminatory temporal frequency artifacts between various frames. Detecting deepfake videos using temporal modeling still poses a challenge. To address this issue, we present a novel deepfake video detection framework in this paper that consists of two levels: temporal modeling and coherence analysis. At the first level, to fully capture temporal coherence over the entire video, we devise an efficient temporal facial pattern (TFP) mechanism that explores the color variations of forgery-sensitive facial areas by providing global and local-successive temporal views. The second level presents a temporal coherence analyzing network (TCAN), which consists of novel global temporal self-attention characteristics, high-resolution fine and low-resolution coarse feature extraction, and aggregation mechanisms, with the aims of long-range relationship modeling from a local-successive temporal perspective within a TFP and capturing the vital dynamic incoherence for robust detection. Thorough experiments on large-scale datasets, including FaceForensics++, DeepFakeDetection, DeepFake Detection Challenge, CelebDF-V2, and DeeperForensics, reveal that our paradigm surpasses current approaches and stays effective when detecting unseen sorts of deepfake videos.
Keywords