IEEE Access (Jan 2024)
Dual-Stream Intermediate Fusion Network for Image Forgery Localization
Abstract
Nowadays, powerful image editing applications not only simplify image processing significantly but also enhance the realism of processed digital images. However, this convenience has presented unprecedented challenges in verifying the authenticity of images. Although existing methods have achieved significant results in image forgery localization, most of them struggle to obtain satisfactory performance when dealing with tampered areas of various sizes, especially for large-scale tampered regions. To enhance the localization performance for various types and sizes of tampered regions, we propose a novel dual-stream intermediate fusion network for image forgery localization, named DIF-Net. This network adopts an encoder-decoder architecture composed of an adaptive convolutional pyramid and dual-stream intermediate fusion modules. Specifically, the former extracts multi-scale information from different depths by utilizing two depth-wise strip convolutions instead of standard large-kernel convolutions. Moreover, during feature fusion, learnable parameters are employed to dynamically allocate weights to each feature scale, so that the network can adaptively select the most relevant features at the target scale. The latter effectively reduces category information differences between the two feature streams by utilizing two learnable intermediate representations to model channel and spatial consistency in the dual-stream features. Compared to traditional and previous deep learning methods, the DIF-Net can generate high-quality prediction masks with fewer parameters. Through extensive experimental validation, our DIF-Net demonstrates outstanding performance on various datasets, surpassing the state-of-the-art forgery localization methods currently available. On the commonly used CASIA2 dataset, our DIF-Net achieves an improvement of 3.3% in F1 and 2.4% in AUC compared to previous methods.
Keywords