IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
MSCC-ViT:A Multiscale Visual-Transformer Network Using Convolution Crossing Attention for Hyperspectral Unmixing
Abstract
Deep-learning-based methods are increasingly being applied in hyperspectral image unmixing (HSU) tasks, among which the transformer model has shown superior performance and faster processing speed. However, recently proposed transformer-based HSU models are limited to single-scale feature learning, while ignoring the learning and interaction of image features at multiple different scales. To overcome this limitation, we propose a multiscale visual transformer network using a convolution crossing attention (CCA) (MSCC-ViT) model. MSCC-ViT aims to extract feature information from hyperspectral images (HSI) at different scales and effectively fuse them, allowing the model to simultaneously recognize parts of HSI with similar spectral features and distinguish parts with different features. In general, MSCC-ViT first utilizes a multiscale feature extraction module composed of 2-D–3-D convolutional layers to preliminarily extract HSI information, which is then input into a multiscale transformer to learn information at different scales. A CCA module is used for multiscale information fusion. Finally, the decoder utilizes a simple convolutional layer to restore the original image. The model was tested on one simulated dataset and three real datasets, and the results proved the superiority of the proposed model.
Keywords