IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
A Joint Architecture of Mixed-Attention Transformer and Octave Module for Hyperspectral Image Denoising
Abstract
Convolutional neural networks (CNNs) recently have achieved impressive performance for hyperspectral image denoising. However, current CNNs have limitations in exploring spectral correlations across various bands and the interactions among features within each band. Although transformers are introduced to capture spatial-spectral correlation in hyperspectral image (HSI), generally, they either explore intercorrelation in bands or intracorrelation between bands, neglecting the combination of intra- and intercorrelation in HSI cubes. Besides, transformer methods rarely address hierarchical (i.e., the low- and high-level) features in an adaptive manner. That is, features at different levels are of different importance, whereas these features are tackled equally in these methods. To alleviate these limitations, we introduce a joint architecture (the so-called MAOTformer) of mixed-attention transformers and octave modules for HSI denoising. On the one hand, the mixed attention transformer (MATB) is designed to simultaneously capture pixel relationships inter- and intrabands by incorporating naive spatial self-attention, bidirectional recurrent channel attention, and progressive channel attention. Besides, a U-net based on mixed attention is equipped with attentive skip connections in MATB, which enables the proposed MAOTformer to explore hierarchical features by U-net and adaptively connect these hierarchical features by the attentive skip connections. On the other hand, we introduce an octave module behind each MATB to utilize multiscale features for separating noise in high-frequency components. Extensive experiments are conducted on synthetic and real-world HSIs, showing that the proposed method outperforms state-of-the-art methods.
Keywords