HRDLNet: a semantic segmentation network with high resolution representation for urban street view images

Wenyi Chen; Zongcheng Miao; Yang Qu; Guokai Shi

doi:10.1007/s40747-024-01582-1

Complex & Intelligent Systems (Aug 2024)

HRDLNet: a semantic segmentation network with high resolution representation for urban street view images

Wenyi Chen,
Zongcheng Miao,
Yang Qu,
Guokai Shi

Affiliations

Wenyi Chen: Technological Institute of Materials & Energy Science (TIMES), School of Electronic Information, Xijing University
Zongcheng Miao: School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University
Yang Qu: Chinese People’s Liberation Army Unit 96751
Guokai Shi: North Automatic Control Technology Research Institute

DOI: https://doi.org/10.1007/s40747-024-01582-1
Journal volume & issue: Vol. 10, no. 6
pp. 7825 – 7844

Abstract

Read online

Abstract Semantic segmentation of urban street scenes has attracted much attention in the field of autonomous driving, which not only helps vehicles perceive the environment in real time, but also significantly improves the decision-making ability of autonomous driving systems. However, most of the current methods based on Convolutional Neural Network (CNN) mainly use coding the input image to a low resolution and then try to recover the high resolution, which leads to problems such as loss of spatial information, accumulation of errors, and difficulty in dealing with large-scale changes. To address these problems, in this paper, we propose a new semantic segmentation network (HRDLNet) for urban street scene images with high-resolution representation, which improves the accuracy of segmentation by always maintaining a high-resolution representation of the image. Specifically, we propose a feature extraction module (FHR) with high-resolution representation, which efficiently handles multi-scale targets and high-resolution image information by efficiently fusing high-resolution information and multi-scale features. Secondly, we design a multi-scale feature extraction enhancement (MFE) module, which significantly expands the sensory field of the network, thus enhancing the ability to capture correlations between image details and global contextual information. In addition, we introduce a dual-attention mechanism module (CSD), which dynamically adjusts the network to more accurately capture subtle features and rich semantic information in images. We trained and evaluated HRDLNet on the Cityscapes Dataset and the PASCAL VOC 2012 Augmented Dataset, and verified the model’s excellent performance in the field of urban streetscape image segmentation. The unique advantages of our proposed HRDLNet in the field of semantic segmentation of urban streetscapes are also verified by comparing it with the state-of-the-art methods.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords