MSCC-ViT:A Multiscale Visual-Transformer Network Using Convolution Crossing Attention for Hyperspectral Unmixing

Cheng Xu; Fei Ye; Fanqiang Kong; Yunsong Li; Zhijie Lv

doi:10.1109/JSTARS.2024.3465227

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

MSCC-ViT:A Multiscale Visual-Transformer Network Using Convolution Crossing Attention for Hyperspectral Unmixing

Cheng Xu,
Fei Ye,
Fanqiang Kong,
Yunsong Li,
Zhijie Lv

Affiliations

Cheng Xu: ORCiD; College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Fei Ye: ORCiD; College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Fanqiang Kong: ORCiD; College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Yunsong Li: ORCiD; State Key Laboratory of Integrated Service Networks, Xidian University, Xi'an, China
Zhijie Lv: College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China

DOI: https://doi.org/10.1109/JSTARS.2024.3465227
Journal volume & issue: Vol. 17
pp. 18070 – 18082

Abstract

Read online

Deep-learning-based methods are increasingly being applied in hyperspectral image unmixing (HSU) tasks, among which the transformer model has shown superior performance and faster processing speed. However, recently proposed transformer-based HSU models are limited to single-scale feature learning, while ignoring the learning and interaction of image features at multiple different scales. To overcome this limitation, we propose a multiscale visual transformer network using a convolution crossing attention (CCA) (MSCC-ViT) model. MSCC-ViT aims to extract feature information from hyperspectral images (HSI) at different scales and effectively fuse them, allowing the model to simultaneously recognize parts of HSI with similar spectral features and distinguish parts with different features. In general, MSCC-ViT first utilizes a multiscale feature extraction module composed of 2-D–3-D convolutional layers to preliminarily extract HSI information, which is then input into a multiscale transformer to learn information at different scales. A CCA module is used for multiscale information fusion. Finally, the decoder utilizes a simple convolutional layer to restore the original image. The model was tested on one simulated dataset and three real datasets, and the results proved the superiority of the proposed model.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords