Sensors (Feb 2025)
A Method Combining Discrete Cosine Transform with Attention for Multi-Temporal Remote Sensing Image Matching
Abstract
Multi-temporal remote sensing image matching is crucial for tasks such as drone positioning under satellite-denial conditions, natural disaster monitoring, and land-cover-change detection. However, the significant differences between multi-temporal images often lead to the reduced accuracy or even failure of most image matching methods in these scenarios. To address this challenge, this paper introduces a Discrete Cosine Transform (DCT) for frequency analysis tailored to the characteristics of remote sensing images, and proposes a network that combines the DCT with attention mechanisms for multi-scale feature matching. First, DCT-enhanced channel attention is embedded in the multi-scale feature extraction module to capture richer ground object information. Second, in coarse-scale feature matching, DCT-guided sparse attention is proposed for feature enhancement, which suppresses the impact of temporal differences on matching while making the amount of computation controllable. The coarse-scale matching results are further refined in the fine-scale feature map to obtain the final matches. Our method achieved correct keypoint percentages of 81.92% and 88.48%, with average corner errors of 4.27 and 2.98 pixels on the DSIFN dataset and LEVIR-CD dataset, respectively, while maintaining a high inference speed. The experimental results demonstrate that our method outperformed the state-of-art methods in terms of both robustness and efficiency in the multi-temporal scenarios.
Keywords