International Journal of Applied Earth Observations and Geoinformation (Feb 2025)

PCET: Patch Confidence-Enhanced Transformer with efficient spectral–spatial features for hyperspectral image classification

  • Li Fang,
  • Xuanli Lan,
  • Tianyu Li,
  • Huifang Shen

Journal volume & issue
Vol. 136
p. 104308

Abstract

Read online

Hyperspectral image (HSI) classification based on deep learning has demonstrated promising performance. In general, using patch-wise samples helps to extract the spatial relationship between pixels and local contextual information. However, the presence of background or other category information in an image patch that is inconsistent with the central target category has a negative effect on classification. To solve this issue, a patch confidence-enhanced transformer (PCET) approach for HSI classification is proposed. To be specific, we design a patch quality assessment (PQA) branch model to evaluate the input patches during training process, which effectively filters out the intrusive non-central pixels. The output confidence of the branch model serves as a quantitative indicator of the contribution degree of the input patch to the overall training efficacy, which is subsequently weighted in the loss function, thereby endowing the model with the capability to dynamically adjust its learning focus based on the qualitative of the inputs. Second, a spectral–spatial multi-feature fusion (SSMF) module is devised to procure scores of representative information simultaneously and fully exploit the potential of multi-scale feature HSI data. Finally, to enhance feature discrimination, global context is efficiently modeled using the efficient additive attention transformer (EA2T) module, which streamlines the attention process and allows the model to learn efficient and robust global representations for accurate classification of the central pixel. A series of experimental results executed on real HSI datasets have substantiated that the proposed PCET can achieve outstanding performance, even when only 10 samples per category are used for training.

Keywords