Remote Sensing (Jan 2024)

A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification

  • Haitao Xu,
  • Tie Zheng,
  • Yuzhe Liu,
  • Zhiyuan Zhang,
  • Changbin Xue,
  • Jiaojiao Li

DOI
https://doi.org/10.3390/rs16030489
Journal volume & issue
Vol. 16, no. 3
p. 489

Abstract

Read online

The fusion of hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data for classification has received widespread attention and has led to significant progress in research and remote sensing applications. However, existing common CNN architectures suffer from the significant drawback of not being able to model remote sensing images globally, while transformer architectures are not able to capture local features effectively. To address these bottlenecks, this paper proposes a classification framework for multisource remote sensing image fusion. First, a spatial and spectral feature projection network is constructed based on parallel feature extraction by combining HSI and LiDAR data, which is conducive to extracting joint spatial, spectral, and elevation features from different source data. Furthermore, in order to construct local–global nonlinear feature mapping more flexibly, a network architecture coupling together multiscale convolution and a multiscale vision transformer is proposed. Moreover, a plug-and-play nonlocal feature token aggregation module is designed to adaptively adjust the domain offsets between different features, while a class token is employed to reduce the complexity of high-dimensional feature fusion. On three open-source remote sensing datasets, the performance of the proposed multisource fusion classification framework improves about 1% to 3% over other state-of-the-art algorithms.

Keywords