MGFNet: An MLP-dominated gated fusion network for semantic segmentation of high-resolution multi-modal remote sensing images

Kan Wei; JinKun Dai; Danfeng Hong; Yuanxin Ye

International Journal of Applied Earth Observations and Geoinformation (Dec 2024)

MGFNet: An MLP-dominated gated fusion network for semantic segmentation of high-resolution multi-modal remote sensing images

Kan Wei,
JinKun Dai,
Danfeng Hong,
Yuanxin Ye

Affiliations

Kan Wei: The Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu, 610031, China
JinKun Dai: The Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu, 610031, China
Danfeng Hong: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100094, China
Yuanxin Ye: The Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu, 610031, China; Corresponding author.

Journal volume & issue: Vol. 135
p. 104241

Abstract

Read online

The heterogeneity and complexity of multimodal data in high-resolution remote sensing images significantly challenges existing cross-modal networks in fusing the complementary information of high-resolution optical and synthetic aperture radar (SAR) images for precise semantic segmentation. To address this issue, this paper proposes a multi-layer perceptron (MLP) dominated gate fusion network (MGFNet). MGFNet consists of three modules: a multi-path feature extraction network, an MLP-gate fusion (MGF) module, and a decoder. Initially, MGFNet independently extracts features from high-resolution optical and SAR images while preserving spatial information. Then, the well-designed MGF module combines the multi-modal features through channel attention and gated fusion stages, utilizing MLP as a gate to exploit complementary information and filter redundant data. Additionally, we introduce a novel high-resolution multimodal remote sensing dataset, YESeg-OPT-SAR, with a spatial resolution of 0.5 m. To evaluate MGFNet, we compare it with several state-of-the-art (SOTA) models using YESeg-OPT-SAR and Pohang datasets, both of which are high-resolution multi-modal datasets. The experimental results demonstrate that MGFNet achieves higher evaluation metrics compared to other models, indicating its effectiveness in multi-modal feature fusion for segmentation. The source code and data are available at https://github.com/yeyuanxin110/YESeg-OPT-SAR.

Published in International Journal of Applied Earth Observations and Geoinformation

ISSN: 1569-8432 (Print); 1872-826X (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Geography. Anthropology. Recreation: Physical geography; Geography. Anthropology. Recreation: Environmental sciences
Website: https://www.journals.elsevier.com/international-journal-of-applied-earth-observation-and-geoinformation

About the journal

Abstract

Keywords