A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification

Haitao Xu; Tie Zheng; Yuzhe Liu; Zhiyuan Zhang; Changbin Xue; Jiaojiao Li

doi:10.3390/rs16030489

Remote Sensing (Jan 2024)

A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification

Haitao Xu,
Tie Zheng,
Yuzhe Liu,
Zhiyuan Zhang,
Changbin Xue,
Jiaojiao Li

Affiliations

Haitao Xu: National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
Tie Zheng: National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
Yuzhe Liu: The State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710200, China
Zhiyuan Zhang: The State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710200, China
Changbin Xue: National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
Jiaojiao Li: The State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710200, China

DOI: https://doi.org/10.3390/rs16030489
Journal volume & issue: Vol. 16, no. 3
p. 489

Abstract

Read online

The fusion of hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data for classification has received widespread attention and has led to significant progress in research and remote sensing applications. However, existing common CNN architectures suffer from the significant drawback of not being able to model remote sensing images globally, while transformer architectures are not able to capture local features effectively. To address these bottlenecks, this paper proposes a classification framework for multisource remote sensing image fusion. First, a spatial and spectral feature projection network is constructed based on parallel feature extraction by combining HSI and LiDAR data, which is conducive to extracting joint spatial, spectral, and elevation features from different source data. Furthermore, in order to construct local–global nonlinear feature mapping more flexibly, a network architecture coupling together multiscale convolution and a multiscale vision transformer is proposed. Moreover, a plug-and-play nonlocal feature token aggregation module is designed to adaptively adjust the domain offsets between different features, while a class token is employed to reduce the complexity of high-dimensional feature fusion. On three open-source remote sensing datasets, the performance of the proposed multisource fusion classification framework improves about 1% to 3% over other state-of-the-art algorithms.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords