Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer

Hongmei Wang; Lin Li; Chenkai Li; Xuanyu Lu

doi:10.1109/ACCESS.2023.3298437

IEEE Access (Jan 2023)

Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer

Hongmei Wang,
Lin Li,
Chenkai Li,
Xuanyu Lu

Affiliations

Hongmei Wang: ORCiD; School of Astronautics, Northwestern Polytechnical University, Xi’an, China
Lin Li: Beijing Research Institute of Telemetry, Beijing, China
Chenkai Li: School of Astronautics, Northwestern Polytechnical University, Xi’an, China
Xuanyu Lu: School of Astronautics, Northwestern Polytechnical University, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2023.3298437
Journal volume & issue: Vol. 11
pp. 78956 – 78969

Abstract

Read online

Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually. However, most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder, which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images. A novel autoencoder-based image fusion network which consist of encoder module, fusion module and decoder module is constructed in this paper. For the encoder module, the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously. In addition, novel contrast and gradient enhancement feature extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images. The feature images obtained from encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image. Experimental results on three datasets show that the proposed network can better preserve both the clear target and detailed information of infrared and visible images respectively, and outperforms some state-of-the-art methods in both subjective and objective evaluation. At the same time, the fused image obtained by our proposed network can acquire the highest mean average precision in the target detection which proves that image fusion is beneficial for downstream tasks.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords