International Journal of Applied Earth Observations and Geoinformation (Jul 2024)

An efficient Transformer with neighborhood contrastive tokenization for hyperspectral images classification

  • Miaomiao Liang,
  • Xianhao Zhang,
  • Xiangchun Yu,
  • Lingjuan Yu,
  • Zhe Meng,
  • Xiaohong Zhang,
  • Licheng Jiao

Journal volume & issue
Vol. 131
p. 103979

Abstract

Read online

The success of vision Transformers (ViTs) relies heavily on the self-attention mechanism, which requires support from appropriate patch tokenization. However, hyperspectral image (HSI) often suffer from significant noise distortions and spectral uncertainty, which result in unstable attention patterns and overfitting due to equivocal tokenization. In this paper, we propose a neighborhood contrastive tokenization task (NeiCoT) to learn compact, semantically meaningful, and context-sensitive tokens for efficient Transformer encoding. Specifically, we employ a predictor on patch embedding to maximize the mutual information between local individuals and their global average anchor. This encourages neighboring tokens’ relevance and active participation in feature learning. Next, we revise a token-level contrastive loss to align predictions with local individuals and distinguish them from other samples in a mini-batch to enhance tokens rich in contextual semantics. Furthermore, we apply a Gaussian weighting to the tokens’ contrastive loss to balance the neighborhood contribution. Finally, we propose a sequence-specific MAE framework with NeiCoT to achieve HSI representation, and additionally validate NeiCoT on a supervised Transformer backbone. The results demonstrate that NeiCoT consistently enhances the robustness and generalization of the Transformer, achieving accurate object recognition and boundary localization even with limited training samples. Our code will be available at https://github.com/zoegnov07/NeiCoT.

Keywords