IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
MLKNet: Multi-Stage for Remote Sensing Image Spatiotemporal Fusion Network Based on a Large Kernel Attention
Abstract
Currently, within the realm of deep learning-based spatiotemporal fusion algorithms, those that employ solely convolutional operations are unable to efficiently extract the global image information. In addition, fusion networks that employ a combination of convolution and transformer neglect the 2-D structure of remote sensing images and the role of their channels during training, resulting in an increased computational cost. The current complex fusion methods introduce noise and disregard the correlation between low fractional rate image's time-varying features and high-resolution image's spatial features. To address these issues, we propose TFNet—a temporal feature extraction network that combines normal and deep convolutions to better extract temporal features while reducing computational costs. Second, we suggest utilizing a convolution-based attention module with a large kernel to replace the transformer (LAM), which facilitates adjustment in both spatial and channel dimensions while preserving the image structure. Furthermore, for improved image fusion, we recommend a two-stage fusion module to merge feature images of various scales. This module for fusion integrates features of varying scales and resolutions from various perspectives, thereby preventing noise inclusions and producing favourable fusion outcomes. In addition, we advocate for the utilization of spatiotemporal fusion techniques on other satellites by introducing a new dataset, SW, which is founded on satellite images from Gaofen-1 and moderate-resolution imaging spectroradiometer.
Keywords