A Joint Architecture of Mixed-Attention Transformer and Octave Module for Hyperspectral Image Denoising

Mahmood Ashraf; Lihui Chen; Xichuan Zhou; Muhammad Allah Rakha

doi:10.1109/JSTARS.2024.3356523

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

A Joint Architecture of Mixed-Attention Transformer and Octave Module for Hyperspectral Image Denoising

Mahmood Ashraf,
Lihui Chen,
Xichuan Zhou,
Muhammad Allah Rakha

Affiliations

Mahmood Ashraf: ORCiD; School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
Lihui Chen: ORCiD; School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
Xichuan Zhou: ORCiD; School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
Muhammad Allah Rakha: ORCiD; Department of Computer Science, FAST-NUCES University, Peshawar, Pakistan

DOI: https://doi.org/10.1109/JSTARS.2024.3356523
Journal volume & issue: Vol. 17
pp. 4331 – 4349

Abstract

Read online

Convolutional neural networks (CNNs) recently have achieved impressive performance for hyperspectral image denoising. However, current CNNs have limitations in exploring spectral correlations across various bands and the interactions among features within each band. Although transformers are introduced to capture spatial-spectral correlation in hyperspectral image (HSI), generally, they either explore intercorrelation in bands or intracorrelation between bands, neglecting the combination of intra- and intercorrelation in HSI cubes. Besides, transformer methods rarely address hierarchical (i.e., the low- and high-level) features in an adaptive manner. That is, features at different levels are of different importance, whereas these features are tackled equally in these methods. To alleviate these limitations, we introduce a joint architecture (the so-called MAOTformer) of mixed-attention transformers and octave modules for HSI denoising. On the one hand, the mixed attention transformer (MATB) is designed to simultaneously capture pixel relationships inter- and intrabands by incorporating naive spatial self-attention, bidirectional recurrent channel attention, and progressive channel attention. Besides, a U-net based on mixed attention is equipped with attentive skip connections in MATB, which enables the proposed MAOTformer to explore hierarchical features by U-net and adaptively connect these hierarchical features by the attentive skip connections. On the other hand, we introduce an octave module behind each MATB to utilize multiscale features for separating noise in high-frequency components. Extensive experiments are conducted on synthetic and real-world HSIs, showing that the proposed method outperforms state-of-the-art methods.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords