IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
UMTF-Net: An Unsupervised Multiscale Transformer Fusion Network for Hyperspectral and Multispectral Image Fusion
Abstract
Hyperspectral images (HSIs) are extensively utilized in several fields due to their abundant spectral band, particularly for tasks like ground object classification and environmental monitoring. However, as a result of equipment and imaging condition constraints, HSI frequently demonstrates a restricted spatial resolution. The fusion of a low-resolution HSI and a high-resolution multispectral image (HR-MSI) of the same scene is a crucial method for generating an HR-HSI. At present, due to factors, such as complexity and GPU memory limitation, most of the HSI–MSI fusion algorithms based on deep learning (DL) cannot utilize the transformer module well to capture the long-range dependence information in large-size remote sensing images. At the same time, the lack of a large amount of high-quality training data has become an important problem that affects the performance of fusion algorithms based on DL. In response to the above issues, this article introduces a new unsupervised multiscale transformer fusion (UMTF) network, called UMTF-Net, which enables HSI–MSI fusion without the need for additional training data. UMTF-Net is composed of an HSI fusion network and a U-network (U-Net)-based multiscale feature extraction network. In order to learn the cross-feature spatial similarity and long-range dependency of MSI and HSI, we first extract the multiscale features of MSI using the U-Net-based multiscale feature extraction network. We then input these features into the corresponding scale cross-feature fusion transformer module in the HSI fusion network to conduct feature fusion. Then, we input the fused features into the spatial spectral fuse attention module for spatial spectral feature enhancement, and finally generate HR-HSI. Comparing UMTF-Net to other advanced methods, the fusion results from three datasets and multiple ablation experiments indicate that our method performs excellently in different evaluations.
Keywords