IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification

  • Delong Kong,
  • Jiahua Zhang,
  • Shichao Zhang,
  • Xiang Yu,
  • Foyez Ahmed Prodhan

DOI
https://doi.org/10.1109/JSTARS.2024.3441111
Journal volume & issue
Vol. 17
pp. 14486 – 14501

Abstract

Read online

Deep learning is an effective method for hyperspectral image (HSI) classification, where CNN-based and Transformer-based methods have achieved excellent performance. However, there are some drawbacks to the existing CNN-based and Transformer-based HSI classification approaches: 1) CNN-based methods are deficient in showing the extraction of multiscale features and localized features owing to the fixed-size input patch. 2) the MHSA module ignores the interaction capability between multiple attention heads, which leads to insufficient feature fusion in various directions. 3) The weights of attention heads in various directions are disregarded in the MHSA and attention heads are simply concatenated horizontally. To address the above-mentioned limitations, a novel multihead interacted and adaptive integrated transformer (MHIAIFormer) with spatial-spectral attention, which integrates the respective advantages of convolutions and transformers is proposed in this study. A pyramidal spatial-spectral attention (PS2A) feature extraction module is adopted to efficiently capture the localized and multiscale feature information of HSI. The output of PS2A is then sent to the transformer encoder stage through a grouped multiscale cross-dimension embedding module, which includes additive self-attention using multihead interaction and MHSA with adaptive multihead merging to capture the long-range dependencies of the features. Extensive experiments on four datasets verify that our proposed approach achieves more satisfactory classification accuracy when compared with state-of-the-art models. The overall accuracy of the proposed model achieved 95.97%, 98.68%, 92.68%, and 99.49% on four datasets.

Keywords