IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

A CNN-Transformer Combined Remote Sensing Imagery Spatiotemporal Fusion Model

  • Mingyu Jiang,
  • Hua Shao

DOI
https://doi.org/10.1109/JSTARS.2024.3435739
Journal volume & issue
Vol. 17
pp. 13995 – 14009

Abstract

Read online

Remote sensing images (RSIs) spatiotemporal fusion (STF) make a significant contribution to acquisition of RSIs sequence with simultaneously high temporal and spatial resolution, which broadens its application fields. However, the existing RSIs STF methods lack effective strategies for extracting global information and fusion features between different images. Conversely, the existing state-of-the-art (SOTA) methods generally require more than two RSIs from different satellites as reference, which increases the difficulty of data collection and limits the application in practice. To address these problems, this article proposed an end-to-end CNN and Transformer combined RSIs STF model (CTSTFM) based on two reference RSIs. Specifically, the proposed CTSTFM consists of three basic modules: multikernel convolutional transformer encoder (MKCE), cross fusion module (CFM), and convolutional-based compression decoder (CCD). The MKCE combines multikernel channel attention block and multikernel spatial attention block to extract shadow features and long-term interdependencies in reference RSIs. The CFM uses the unique cross exchange transformer block and combine fusion transformer block to enhance the feature fusion results. Due to the powerful encoder and fusion module, in the CCD part we only use a simple design convolution module to save the consumption of computational resources. Experiments on two well-known open access datasets show that CTSTFM achieves competitive results in both qualitative and quantitative comparisons compared to the SOTA methods. Meanwhile, we conduct experiment to analyze the image Tessellation effects and its solution. The effectiveness of the proposed module will be demonstrated through ablation experiments.

Keywords