DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution

Huayi Zhu; Heshan Wu; Xiaolong Wang; Dongmei He; Zhenbing Liu; Xipeng Pan

doi:10.3390/s23167205

Sensors (Aug 2023)

DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution

Huayi Zhu,
Heshan Wu,
Xiaolong Wang,
Dongmei He,
Zhenbing Liu,
Xipeng Pan

Affiliations

Huayi Zhu: School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
Heshan Wu: School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
Xiaolong Wang: School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
Dongmei He: School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
Zhenbing Liu: School of Artificial Intelligence, Guilin University of Electronic Science and Technology, Guilin 541004, China
Xipeng Pan: School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China

DOI: https://doi.org/10.3390/s23167205
Journal volume & issue: Vol. 23, no. 16
p. 7205

Abstract

Read online

Infrared and visible image fusion aims to generate a single fused image that not only contains rich texture details and salient objects, but also facilitates downstream tasks. However, existing works mainly focus on learning different modality-specific or shared features, and ignore the importance of modeling cross-modality features. To address these challenges, we propose Dual-branch Progressive learning for infrared and visible image fusion with a complementary self-Attention and Convolution (DPACFuse) network. On the one hand, we propose Cross-Modality Feature Extraction (CMEF) to enhance information interaction and the extraction of common features across modalities. In addition, we introduce a high-frequency gradient convolution operation to extract fine-grained information and suppress high-frequency information loss. On the other hand, to alleviate the CNN issues of insufficient global information extraction and computation overheads of self-attention, we introduce the ACmix, which can fully extract local and global information in the source image with a smaller computational overhead than pure convolution or pure self-attention. Extensive experiments demonstrated that the fused images generated by DPACFuse not only contain rich texture information, but can also effectively highlight salient objects. Additionally, our method achieved approximately 3% improvement over the state-of-the-art methods in MI, Qabf, SF, and AG evaluation indicators. More importantly, our fused images enhanced object detection and semantic segmentation by approximately 10%, compared to using infrared and visible images separately.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords