Multi-Encoder Context Aggregation Network for Structured and Unstructured Urban Street Scene Analysis

Tanmay Singha; Duc-Son Pham; Aneesh Krishna

doi:10.1109/ACCESS.2023.3289968

IEEE Access (Jan 2023)

Multi-Encoder Context Aggregation Network for Structured and Unstructured Urban Street Scene Analysis

Tanmay Singha,
Duc-Son Pham,
Aneesh Krishna

Affiliations

Tanmay Singha: ORCiD; School of Electrical Engineering, Computing, and Mathematical Sciences, Curtin University, Perth, WA, Australia
Duc-Son Pham: ORCiD; School of Electrical Engineering, Computing, and Mathematical Sciences, Curtin University, Perth, WA, Australia
Aneesh Krishna: School of Electrical Engineering, Computing, and Mathematical Sciences, Curtin University, Perth, WA, Australia

DOI: https://doi.org/10.1109/ACCESS.2023.3289968
Journal volume & issue: Vol. 11
pp. 66227 – 66244

Abstract

Read online

Developing computationally efficient semantic segmentation models that are suitable for resource-constrained mobile devices is an open challenge in computer vision research. To address this challenge, we propose a novel real-time semantic scene segmentation model called Multi-encoder Context Aggregation Network (MCANet), which offers the best combination of low model complexity and state-of-the-art (SOTA) performance on benchmark datasets. While we follow the multi-encoder approach, our novelty lies in the varying number of scales to capture both global context and local details effectively. We introduce suitable lateral connections between sub-encoders for improved feature refinement. We also optimize the backbone by exploiting the residual block of MobileNet for resource-constrained applications. On the decoder side, the proposed model includes a new Local and Global Context Aggregation (LGCA) module that significantly enhances semantic details in the segmentation output. Finally, we use several known efficient convolution techniques for the classification module to make the model more computationally efficient. We provide a comprehensive evaluation of MCANet on multiple datasets containing structured and unstructured urban street scenes. Among the existing real-time models with less than 3 million parameters, the proposed model is more competitive as it achieves the SOTA performance without ImageNet pre-trained weights on both structured and unstructured environments while being more compact for resource-constrained applications.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords