Multi-scale attention-based lightweight network with dilated convolutions for infrared and visible image fusion

Fuquan Li; Yonghui Zhou; YanLi Chen; Jie Li; ZhiCheng Dong; Mian Tan

doi:10.1007/s40747-023-01185-2

Complex & Intelligent Systems (Aug 2023)

Multi-scale attention-based lightweight network with dilated convolutions for infrared and visible image fusion

Fuquan Li,
Yonghui Zhou,
YanLi Chen,
Jie Li,
ZhiCheng Dong,
Mian Tan

Affiliations

Fuquan Li: School of Big Data and Computer Science, Guizhou Normal University
Yonghui Zhou: School of Big Data and Computer Science, Guizhou Normal University
YanLi Chen: School of Big Data and Computer Science, Guizhou Normal University
Jie Li: School of Intelligent Technology and Engineering, Chongqing University of Science and Technology
ZhiCheng Dong: School of Information Science and Technology, Tibet University
Mian Tan: Guizhou Key Laboratory of Pattern Recognition and Intelligent System, Guizhou Minzu University

DOI: https://doi.org/10.1007/s40747-023-01185-2
Journal volume & issue: Vol. 10, no. 1
pp. 705 – 719

Abstract

Read online

Abstract Infrared and visible image fusion aims to generate synthetic images including salient targets and abundant texture details. However, traditional techniques and recent deep learning-based approaches have faced challenges in preserving prominent structures and fine-grained features. In this study, we propose a lightweight infrared and visible image fusion network utilizing multi-scale attention modules and hybrid dilated convolutional blocks to preserve significant structural features and fine-grained textural details. First, we design a hybrid dilated convolutional block with different dilation rates that enable the extraction of prominent structure features by enlarging the receptive field in the fusion network. Compared with other deep learning methods, our method can obtain more high-level semantic information without piling up a large number of convolutional blocks, effectively improving the ability of feature representation. Second, distinct attention modules are designed to integrate into different layers of the network to fully exploit contextual information of the source images, and we leverage the total loss to guide the fusion process to focus on vital regions and compensate for missing information. Extensive qualitative and quantitative experiments demonstrate the superiority of our proposed method over state-of-the-art methods in both visual effects and evaluation metrics. The experimental results on public datasets show that our method can improve the entropy (EN) by 4.80%, standard deviation (SD) by 3.97%, correlation coefficient (CC) by 1.86%, correlations of differences (SCD) by 9.98%, and multi-scale structural similarity (MS_SSIM) by 5.64%, respectively. In addition, experiments with the VIFB dataset further indicate that our approach outperforms other comparable models.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords