IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
Adaptive Pixel-Level and Superpixel-Level Feature Fusion Transformer for Hyperspectral Image Classification
Abstract
Significant progress has been achieved in hyperspectral image (HSI) classification research through the application of the transformer blocks. Despite transformers possess strong long-range dependence modeling capabilities, they primarily extract nonlocal information from patches and often fail to fully capture global information, leading to incomplete spectral-spatial feature extraction. However, graph convolutional networks (GCNs) can effectively extract features from the global structure. This article proposes an adaptive pixel-level and superpixel-level feature fusion transformer (APSFFT). The network comprises two branches: one is the convolutional neural networks (CNNs) and transformer networks (CNTN), and the other is the GCNs and transformer networks (GNTN). These branches are designed to extract pixel-level and superpixel-level feature information from HSI, respectively. CNTN leverages the strengths of CNNs in extracting spectral–spatial information, combined with the transformer network's ability to establish long-range dependencies based on self-attention (SA). The GNTN fully extracts superpixel-level features while establishing long-range dependencies. To adaptively fuse the features from these two branches, an adaptive cross-token attention fusion (ACTAF) encoder is utilized. The ACTAF encoder fuses the classification token from both branches through SA, thereby enhancing the model's ability to capture interactions between pixel-level and superpixel-level features. We compared and analyzed seven advanced HSI classification algorithms, and experiments showed that APSFFT outperforms other state-of-the-art methods.
Keywords