MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification

Delong Kong; Jiahua Zhang; Shichao Zhang; Xiang Yu; Foyez Ahmed Prodhan

doi:10.1109/JSTARS.2024.3441111

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification

Delong Kong,
Jiahua Zhang,
Shichao Zhang,
Xiang Yu,
Foyez Ahmed Prodhan

Affiliations

Delong Kong: ORCiD; Remote Sensing Information and Digital Earth Center, College of Computer Science and Technology, Qingdao University, Qingdao, China
Jiahua Zhang: ORCiD; Remote Sensing Information and Digital Earth Center, College of Computer Science and Technology, Qingdao University, Qingdao, China
Shichao Zhang: Remote Sensing Information and Digital Earth Center, College of Computer Science and Technology, Qingdao University, Qingdao, China
Xiang Yu: ORCiD; Remote Sensing Information and Digital Earth Center, College of Computer Science and Technology, Qingdao University, Qingdao, China
Foyez Ahmed Prodhan: ORCiD; Department of Agricultural Extension and Rural Development, Bangabandhu Sheikh Mujibur Rahman Agricultural University, Gazipur, Bangladesh

DOI: https://doi.org/10.1109/JSTARS.2024.3441111
Journal volume & issue: Vol. 17
pp. 14486 – 14501

Abstract

Read online

Deep learning is an effective method for hyperspectral image (HSI) classification, where CNN-based and Transformer-based methods have achieved excellent performance. However, there are some drawbacks to the existing CNN-based and Transformer-based HSI classification approaches: 1) CNN-based methods are deficient in showing the extraction of multiscale features and localized features owing to the fixed-size input patch. 2) the MHSA module ignores the interaction capability between multiple attention heads, which leads to insufficient feature fusion in various directions. 3) The weights of attention heads in various directions are disregarded in the MHSA and attention heads are simply concatenated horizontally. To address the above-mentioned limitations, a novel multihead interacted and adaptive integrated transformer (MHIAIFormer) with spatial-spectral attention, which integrates the respective advantages of convolutions and transformers is proposed in this study. A pyramidal spatial-spectral attention (PS2A) feature extraction module is adopted to efficiently capture the localized and multiscale feature information of HSI. The output of PS2A is then sent to the transformer encoder stage through a grouped multiscale cross-dimension embedding module, which includes additive self-attention using multihead interaction and MHSA with adaptive multihead merging to capture the long-range dependencies of the features. Extensive experiments on four datasets verify that our proposed approach achieves more satisfactory classification accuracy when compared with state-of-the-art models. The overall accuracy of the proposed model achieved 95.97%, 98.68%, 92.68%, and 99.49% on four datasets.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords