Remote Sensing (Oct 2024)
Multi-Window Fusion Spatial-Frequency Joint Self-Attention for Remote-Sensing Image Super-Resolution
Abstract
Remote-sensing images typically feature large dimensions and contain repeated texture patterns. To effectively capture finer details and encode comprehensive information, feature-extraction networks with larger receptive fields are essential for remote-sensing image super-resolution tasks. However, mainstream methods based on stacked Transformer modules suffer from limited receptive fields due to fixed window sizes, impairing long-range dependency capture and fine-grained texture reconstruction. In this paper, we propose a spatial-frequency joint attention network based on multi-window fusion (MWSFA). Specifically, our approach introduces a multi-window fusion strategy, which merges windows with similar textures to allow self-attention mechanisms to capture long-range dependencies effectively, therefore expanding the receptive field of the feature extractor. Additionally, we incorporate a frequency-domain self-attention branch in parallel with the original Transformer architecture. This branch leverages the global characteristics of the frequency domain to further extend the receptive field, enabling more comprehensive self-attention calculations across different frequency bands and better utilization of consistent frequency information. Extensive experiments on both synthetic and real remote-sensing datasets demonstrate that our method achieves superior visual reconstruction effects and higher evaluation metrics compared to other super-resolution methods.
Keywords