Remote Sensing (Dec 2023)

Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion

  • Shihai Yu,
  • Xu Zhang,
  • Huihui Song

DOI
https://doi.org/10.3390/rs16010144
Journal volume & issue
Vol. 16, no. 1
p. 144

Abstract

Read online

Multispectral image (MSI) and hyperspectral image (HSI) fusion (MHIF) aims to address the challenge of acquiring high-resolution (HR) HSI images. This field combines a low-resolution (LR) HSI with an HR-MSI to reconstruct HR-HSIs. Existing methods directly utilize transformers to perform feature extraction and fusion. Despite the demonstrated success, there exist two limitations: (1) Employing the entire transformer model for feature extraction and fusion fails to fully harness the potential of the transformer in integrating the spectral information of the HSI and spatial information of the MSI. (2) HSIs have a strong spectral correlation and exhibit sparsity in the spatial domain. Existing transformer-based models do not optimize this physical property, which makes their methods prone to spectral distortion. To accomplish these issues, this paper introduces a novel framework for MHIF called a Sparse Mix-Attention Transformer (SMAformer). Specifically, to fully harness the advantages of the transformer architecture, we propose a Spectral Mix-Attention Block (SMAB), which concatenates the keys and values extracted from LR-HSIs and HR-MSIs to create a new multihead attention module. This design facilitates the extraction of detailed long-range information across spatial and spectral dimensions. Additionally, to address the spatial sparsity inherent in HSIs, we incorporated a sparse mechanism within the core of the SMAB called the Sparse Spectral Mix-Attention Block (SSMAB). In the SSMAB, we compute attention maps from queries and keys and select the K highly correlated values as the sparse-attention map. This approach enables us to achieve a sparse representation of spatial information while eliminating spatially disruptive noise. Extensive experiments conducted on three synthetic benchmark datasets, namely CAVE, Harvard, and Pavia Center, demonstrate that the SMAformer method outperforms state-of-the-art methods.

Keywords