IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
Cross-Scale Interaction With Spatial-Spectral Enhanced Window Attention for Pansharpening
Abstract
Pansharpening is a process that fuses a multispectral (MS) image with a panchromatic (PAN) image to generate a high-resolution multispectral (HRMS) image. Current methods often overlook scale inconsistency and the correlation within and between a window domain, resulting in suboptimal outcomes. In addition, the use of deep convolutional neural network or transformer often leads to high computational expenses. To address these challenges, we present a lightweight pansharpening network that leverages cross-scale interaction and spatial-spectral enhanced window attention. We first design a spatial-spectral enhanced window transformer (SEWformer) to effectively capture crucial attention within and between interleaved windows. To improve scale consistency, we develop a cross-scale interactive encoder that interacts with different scale attentions derived from the SEWformer. Furthermore, a multiscale residual network with channel attention is constructed as a decoder, which, in conjunction with the encoder, ensures precise detail extraction. The final HRMS image is obtained by combining the extracted details with the UPMS image. Extensive experimental validation on diverse datasets showcases the superiority of our approach over state-of-the-art pansharpening techniques in terms of both performance and efficiency. Compared to the second-best comparison approach, our method achieves significant improvements in the ERGAS metric: 29.6$\%$ on IKONOS, 43.8$\%$ on Pléiades, and 27.6$\%$ on WorldView-3 datasets.
Keywords