MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

Min Yuan; Dingbang Ren; Qisheng Feng; Zhaobin Wang; Yongkang Dong; Fuxiang Lu; Xiaolin Wu

doi:10.3390/rs15020361

Remote Sensing (Jan 2023)

MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

Min Yuan,
Dingbang Ren,
Qisheng Feng,
Zhaobin Wang,
Yongkang Dong,
Fuxiang Lu,
Xiaolin Wu

Affiliations

Min Yuan: School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
Dingbang Ren: School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
Qisheng Feng: College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730000, China
Zhaobin Wang: School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
Yongkang Dong: School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
Fuxiang Lu: School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
Xiaolin Wu: School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China

DOI: https://doi.org/10.3390/rs15020361
Journal volume & issue: Vol. 15, no. 2
p. 361

Abstract

Read online

Semantic segmentation for urban remote sensing images is one of the most-crucial tasks in the field of remote sensing. Remote sensing images contain rich information on ground objects, such as shape, location, and boundary and can be found in high-resolution remote sensing images. It is exceedingly challenging to identify remote sensing images because of the large intraclass variance and low interclass variance caused by these objects. In this article, we propose a multiscale hierarchical channel attention fusion network model based on a transformer and CNN, which we name the multiscale channel attention fusion network (MCAFNet). MCAFNet uses ResNet-50 and Vit-B/16 to learn the global–local context, and this strengthens the semantic feature representation. Specifically, a global–local transformer block (GLTB) is deployed in the encoder stage. This design handles image details at low resolution and extracts global image features better than previous methods. In the decoder module, a channel attention optimization module and a fusion module are added to better integrate high- and low-dimensional feature maps, which enhances the network’s ability to obtain small-scale semantic information. The proposed method is conducted on the ISPRS Vaihingen and Potsdam datasets. Both quantitative and qualitative evaluations show the competitive performance of MCAFNet in comparison to the performance of the mainstream methods. In addition, we performed extensive ablation experiments on the Vaihingen dataset in order to test the effectiveness of multiple network components.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords