A Real-Time Road Scene Semantic Segmentation Model Based on Spatial Context Learning

Xiaomei Xiao; Jialiang Tang; Xiaoyan Lu; Zhengyong Feng; Yi Li

doi:10.1109/ACCESS.2024.3503676

IEEE Access (Jan 2024)

A Real-Time Road Scene Semantic Segmentation Model Based on Spatial Context Learning

Xiaomei Xiao,
Jialiang Tang,
Xiaoyan Lu,
Zhengyong Feng,
Yi Li

Affiliations

Xiaomei Xiao: ORCiD; School of Electronic Information Engineering, Electronic Information Processing Engineering Technology Research Center, China West Normal University, Nanchong, China
Jialiang Tang: ORCiD; School of Electronic Information Engineering, Electronic Information Processing Engineering Technology Research Center, China West Normal University, Nanchong, China
Xiaoyan Lu: School of Electronic Information Engineering, Electronic Information Processing Engineering Technology Research Center, China West Normal University, Nanchong, China
Zhengyong Feng: ORCiD; School of Electronic Information Engineering, Electronic Information Processing Engineering Technology Research Center, China West Normal University, Nanchong, China
Yi Li: College of Physics and Engineering Technology, Chengdu Normal University, Chengdu, China

DOI: https://doi.org/10.1109/ACCESS.2024.3503676
Journal volume & issue: Vol. 12
pp. 178495 – 178506

Abstract

Read online

To address the issues of high computational complexity and insufficient aggregation of global and local information in existing image segmentation methods, this paper proposes an efficient segmentation model based on Spatial Context Learning, named SCLSeg. The main idea is to aggregate local regions into higher-level semantic regions in a learnable manner. The proposed Spatial Context Guided Feature Alignment module (SC-FA) learns aligned features from image-level to local regions, exploring and integrating contextual information. During training, a multi-scale strategy is used to group semantic regions, and a Channel Aggregation Block (CAB) is designed to dynamically capture semantic groups through a mechanism of feature separation and fusion, thereby aggregating multi-level pixel features to generate the final segmentation results. We further introduce a boundary loss to optimize the accuracy of segmentation edges. To meet real-time processing requirements, a series of lightweight strategies and simplified structures are adopted to reduce computational costs, including lightweight encoding, channel compression, and simplified neck. Our method achieves good performance on the Cityscapes and Camvid datasets, specifically achieving 76.45% mIoU & 237 FPS on the Cityscapes test set, and 73.95% mIoU & 300.4 FPS on the CamVid test set.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords