Remote Sensing (May 2024)

Adaptive Learnable Spectral–Spatial Fusion Transformer for Hyperspectral Image Classification

  • Minhui Wang,
  • Yaxiu Sun,
  • Jianhong Xiang,
  • Rui Sun,
  • Yu Zhong

DOI
https://doi.org/10.3390/rs16111912
Journal volume & issue
Vol. 16, no. 11
p. 1912

Abstract

Read online

In hyperspectral image classification (HSIC), every pixel of the HSI is assigned to a land cover category. While convolutional neural network (CNN)-based methods for HSIC have significantly enhanced performance, they encounter challenges in learning the relevance of deep semantic features and grappling with escalating computational costs as network depth increases. In contrast, the transformer framework is adept at capturing the relevance of high-level semantic features, presenting an effective solution to address the limitations encountered by CNN-based approaches. This article introduces a novel adaptive learnable spectral–spatial fusion transformer (ALSST) to enhance HSI classification. The model incorporates a dual-branch adaptive spectral–spatial fusion gating mechanism (ASSF), which captures spectral–spatial fusion features effectively from images. The ASSF comprises two key components: the point depthwise attention module (PDWA) for spectral feature extraction and the asymmetric depthwise attention module (ADWA) for spatial feature extraction. The model efficiently obtains spectral–spatial fusion features by multiplying the outputs of these two branches. Furthermore, we integrate the layer scale and DropKey into the traditional transformer encoder and multi-head self-attention (MHSA) to form a new transformer with a layer scale and DropKey (LD-Former). This innovation enhances data dynamics and mitigates performance degradation in deeper encoder layers. The experiments detailed in this article are executed on four renowned datasets: Trento (TR), MUUFL (MU), Augsburg (AU), and the University of Pavia (UP). The findings demonstrate that the ALSST model secures optimal performance, surpassing some existing models. The overall accuracy (OA) is 99.70%, 89.72%, 97.84%, and 99.78% on four famous datasets: Trento (TR), MUUFL (MU), Augsburg (AU), and University of Pavia (UP), respectively.

Keywords