A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation

Zhongyu Rao; Yingfeng Cai; Hai Wang; Long Chen; Yicheng Li

doi:10.1049/itr2.12367

IET Intelligent Transport Systems (Dec 2024)

A multi‐stage model for bird's eye view prediction based on stereo‐matching model and RGB‐D semantic segmentation

Zhongyu Rao,
Yingfeng Cai,
Hai Wang,
Long Chen,
Yicheng Li

Affiliations

Zhongyu Rao: Automotive Engineering Research Institute of Jiangsu University Zhenjiang People's Republic of China
Yingfeng Cai: Automotive Engineering Research Institute of Jiangsu University Zhenjiang People's Republic of China
Hai Wang: School of Automotive and Traffic Engineering of Jiangsu University Zhenjiang People's Republic of China
Long Chen: Automotive Engineering Research Institute of Jiangsu University Zhenjiang People's Republic of China
Yicheng Li: Automotive Engineering Research Institute of Jiangsu University Zhenjiang People's Republic of China

DOI: https://doi.org/10.1049/itr2.12367
Journal volume & issue: Vol. 18, no. 12
pp. 2552 – 2564

Abstract

Read online

Abstract Bird's‐Eye‐View (BEV) map is a powerful and detailed scene representation for intelligent vehicles that provides both the location and semantic information about nearby objects from a top‐down perspective. BEV map generation is a complex multi‐stage task, and the existing methods typically perform poorly for distant scenes. Thus, the authors introduce a novel multi‐stage model that infers to obtain more accurate BEV map. First, the authors propose the Adaptive Aggregation with Stereo Mixture Density (AA‐SMD) model, which is an improved stereo matching model that eliminates bleeding artefacts and provides more accurate depth estimation. Next, the authors employ the RGB‐Depth (RGB‐D) semantic segmentation model to improve the semantic segmentation performance and connectivity of their model. The depth map and semantic segmentation maps are then combined to create an incomplete BEV map. Finally, the authors propose a Multi Strip Pooling Unet (MSP‐Unet) model with a hierarchical multi‐scale (HMS) attention and strip pooling (SP) module to improve prediction with BEV generation. The authors evaluate their model with a Car Learn to Act (CARLA)‐generated synthetic dataset. The experiment results demonstrate that the authors’ model generates a highly accurate representation of the surrounding environment achieving a state‐of‐the‐art result of 61.50% Mean Intersection‐over‐Union (MIoU) across eight classes.

Published in IET Intelligent Transport Systems

ISSN: 1751-956X (Print); 1751-9578 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Engineering (General). Civil engineering (General): Transportation engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519578

About the journal

Abstract

Keywords