Depth-Wise Asymmetric Bottleneck With Point-Wise Aggregation Decoder for Real-Time Semantic Segmentation in Urban Scenes

Gen Li; Shenlu Jiang; Inyong Yun; Jonghyun Kim; Joongkyu Kim

doi:10.1109/ACCESS.2020.2971760

IEEE Access (Jan 2020)

Depth-Wise Asymmetric Bottleneck With Point-Wise Aggregation Decoder for Real-Time Semantic Segmentation in Urban Scenes

Gen Li,
Shenlu Jiang,
Inyong Yun,
Jonghyun Kim,
Joongkyu Kim

Affiliations

Gen Li: ORCiD; Department of Electronic, Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Shenlu Jiang: ORCiD; Department of Electronic, Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Inyong Yun: ORCiD; Department of Electronic, Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Jonghyun Kim: ORCiD; Department of Electronic, Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea
Joongkyu Kim: ORCiD; Department of Electronic, Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea

DOI: https://doi.org/10.1109/ACCESS.2020.2971760
Journal volume & issue: Vol. 8
pp. 27495 – 27506

Abstract

Read online

Semantic segmentation is a process of linking each pixel in an image to a class label, and is widely used in the field of autonomous vehicles and robotics. Although deep learning methods have already made great progress for semantic segmentation, they either achieve great results with numerous parameters or design lightweight models but heavily sacrifice the segmentation accuracy. Because of the strict requirements of real-world applications, it is critical to design an effective real-time model with both competitive segmentation accuracy and small model capacity. In this paper, we propose a lightweight network named DABNet, which employs Depth-wise Asymmetric Bottleneck (DAB) and Point-wise Aggregation Decoder (PAD) module to tackle the challenging real-time semantic segmentation in urban scenes. Specifically, the DAB module creates a sufficient receptive field and densely utilizes the contextual information, and the PAD module aggregates the feature maps of different scales to optimize performance through the attention mechanism. Compared with existing methods, our network substantially reduces the number of parameters but still achieves high accuracy with real-time inference ability. Extensive ablation experiments on two challenging urban scene datasets (Cityscapes and CamVid) have proved the effectiveness of the proposed approach in real-time semantic segmentation.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords