Semantic Labeling of High-Resolution Images Combining a Self-Cascaded Multimodal Fully Convolution Neural Network with Fully Conditional Random Field

Qiongqiong Hu; Feiting Wang; Jiangtao Fang; Ying Li

doi:10.3390/rs16173300

Remote Sensing (Sep 2024)

Semantic Labeling of High-Resolution Images Combining a Self-Cascaded Multimodal Fully Convolution Neural Network with Fully Conditional Random Field

Qiongqiong Hu,
Feiting Wang,
Jiangtao Fang,
Ying Li

Affiliations

Qiongqiong Hu: School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
Feiting Wang: Department of Computer Technology and Application, Qinghai University, Xi’ning 810016, China
Jiangtao Fang: Department of Computer Technology and Application, Qinghai University, Xi’ning 810016, China
Ying Li: School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China

DOI: https://doi.org/10.3390/rs16173300
Journal volume & issue: Vol. 16, no. 17
p. 3300

Abstract

Read online

Semantic labeling of very high-resolution remote sensing images (VHRRSI) has emerged as a crucial research area in remote sensing image interpretation. However, challenges arise due to significant variations in target orientation and scale, particularly for small targets that are more prone to obscuration and misidentification. The high interclass similarity and low intraclass similarity further exacerbate difficulties in distinguishing objects with similar color and geographic location. To address this concern, we introduce a self-cascading multiscale network (ScasMNet) based on a fully convolutional network, aimed at enhancing the segmentation precision for each category in remote sensing images (RSIs). In ScasMNet, cropped Digital Surface Model (DSM) data and corresponding RGB data are fed into the network via two distinct paths. In the encoder stage, one branch utilizes convolution to extract height information from DSM images layer by layer, enabling better differentiation of trees and low vegetation with similar color and geographic location. A parallel branch extracts spatial, color, and texture information from the RGB data. By cascading the features of different layers, the heterogeneous data are fused to generate complementary discriminative characteristics. Lastly, to refine segmented edges, fully conditional random fields (DenseCRFs) are employed for postprocessing presegmented images. Experimental findings showcase that ScasMNet achieves an overall accuracy (OA) of 92.74% on two challenging benchmarks, demonstrating its outstanding performance, particularly for small-scale objects. This demonstrates that ScasMNet ranks among the state-of-the-art methods in addressing challenges related to semantic segmentation in RSIs.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords