IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
A Cross-Attention-Based Multi-Information Fusion Transformer for Hyperspectral Image Classification
Abstract
In recent years, deep-learning-based classification methods have been widely used for hyperspectral images (HSIs). However, in the existing transformer-based HSI classification methods, how to effectively and comprehensively utilize the rich information still has room for improvement, for example, when utilizing multiple-image information, the comprehensive interaction between information has insufficient consideration. To address the above issues, cross-attention interaction, class token and patch token information, and multiscale spatial information are addressed in a unified framework, and a cross-attention-based multi-information fusion transformer (CAMFT) for HSI classification was proposed, which includes the multiscale patch embedding module, the residual connection-based DeepViT (RCD) module, and the double-branch cross-attention (DBCA) module. First, the multiscale patch embedding module is formed for multi-information preprocessing, accompanied by the built of different scale processing branches and the addition of learnable class tokens. Second, the RCD module is designed to utilize rich information from different layers; this module includes reattention and residual connection. Third, a DBCA module is constructed to obtain more representative multi-information fusion features; this module not only integrates multiscale patch information but also effectively utilizes complementary information between class tokens and patch tokens in the interaction of two branches. Moreover, numerous experiments demonstrate that, compared with other state-of-the-art classification methods, the proposed CAMFT method achieves the optimal classification performance, especially with a small training sample size, but it still has excellent performance.
Keywords