ISPRS International Journal of Geo-Information (Dec 2024)
A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
Abstract
Remote sensing imagery contains rich information about geographical targets, and performing super-resolution (SR) reconstruction on such images requires greater feature representation capabilities. Convolutional neural network (CNN)-based methods excel at extracting intricate local features but fall short in terms of capturing global representations. While transformer methods are capable of learning long-distance dependencies, they often overlook local feature details, which can diminish the discriminability between the background and the foreground. Moreover, the distinctive architectures of transformers, their extensive parameter counts, and their reliance on large-scale training datasets impose constraints on transformer applications in remote sensing image feature extraction tasks. To address these challenges, this study introduces a novel hybrid CNN-Transformer network model named RepCHAT for remote sensing single image reconstruction, which incorporates a structural re-parameterization technique and a hybrid attention mechanism. This method leverages the strengths of transformers in terms of learning long-distance dependencies (global features) and CNNs with respect to extracting local features. The proposed approach achieves SR reconstruction for remote sensing images with fewer parameters and less computational overhead than those of traditional transformers and high-performance CNN models. We develop a multiscale feature extraction module that integrates both spatial- and frequency-domain features and employs structural re-parameterization theory to increase the inference efficiency of the model. Furthermore, we incorporate depthwise-separable convolution into the transformer block to bolster the local feature learning capabilities of the transformer. The method we propose achieves the optimal performance for remote sensing single-image super-resolution reconstruction and outperforms the competing methods by 0.28–1.05 dB (×4 scale) in terms of signal-to-noise ratio (PSNR). Experimental results indicate that the RepCHAT model proposed in this study maintains a high performance with significantly reduced complexity, making it suitable for deployment on edge devices.
Keywords