Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images

Ronghua Shang; Jiyu Zhang; Licheng Jiao; Yangyang Li; Naresh Marturi; Rustam Stolkin

doi:10.3390/rs12050872

Remote Sensing (Mar 2020)

Multi-scale Adaptive Feature Fusion Network for Semantic Segmentation in Remote Sensing Images

Ronghua Shang,
Jiyu Zhang,
Licheng Jiao,
Yangyang Li,
Naresh Marturi,
Rustam Stolkin

Affiliations

Ronghua Shang: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Jiyu Zhang: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Licheng Jiao: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Yangyang Li: Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Naresh Marturi: Extreme Robotics Laboratory, University of Birmingham, Edgbaston B15 2TT, UK
Rustam Stolkin: Extreme Robotics Laboratory, University of Birmingham, Edgbaston B15 2TT, UK

DOI: https://doi.org/10.3390/rs12050872
Journal volume & issue: Vol. 12, no. 5
p. 872

Abstract

Read online

Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high- and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords