MonoAux: Fully Exploiting Auxiliary Information and Uncertainty for Monocular 3D Object Detection

Zhenglin Li; Wenbo Zheng; Le Yang; Liyan Ma; Yang Zhou; Yan Peng

doi:10.34133/cbsystems.0097

Cyborg and Bionic Systems (Jan 2024)

MonoAux: Fully Exploiting Auxiliary Information and Uncertainty for Monocular 3D Object Detection

Zhenglin Li,
Wenbo Zheng,
Le Yang,
Liyan Ma,
Yang Zhou,
Yan Peng

Affiliations

Zhenglin Li: Institute of Artificial Intelligence, Shanghai University, Shanghai, China.
Wenbo Zheng: Institute of Artificial Intelligence, Shanghai University, Shanghai, China.
Le Yang: Department of Electrical and Computer Engineering, University of Canterbury, Christchurch, New Zealand.
Liyan Ma: Institute of Artificial Intelligence, Shanghai University, Shanghai, China.
Yang Zhou: Institute of Artificial Intelligence, Shanghai University, Shanghai, China.
Yan Peng: Institute of Artificial Intelligence, Shanghai University, Shanghai, China.

DOI: https://doi.org/10.34133/cbsystems.0097
Journal volume & issue: Vol. 5

Abstract

Read online

Monocular 3D object detection plays a pivotal role in autonomous driving, presenting a formidable challenge by requiring the precise localization of 3D objects within a single image, devoid of depth information. Most existing methods in this domain fall short of harnessing the limited information available in monocular 3D detection tasks. They typically provide only a single detection outcome, omitting essential uncertainty analysis and result post-processing during model inference, thus limiting overall model performance. In this paper, we propose a comprehensive framework that maximizes information extraction from monocular images while encompassing diverse depth estimation and incorporating uncertainty analysis. Specifically, we mine additional information intrinsic to the monocular 3D detection task to augment supervision, thereby addressing the information scarcity challenge. Moreover, our framework handles depth estimation by recovering multiple sets of depth values from calculated visual heights. The final depth estimate and 3D confidence are determined through an uncertainty fusion process, effectively reducing inference errors. Furthermore, to address task weight allocation in multi-task training, we present a versatile training strategy tailored to monocular 3D detection. This approach leverages measurement indicators to monitor task progress, adaptively adjusting loss weights for different tasks. Experimental results on the KITTI and Waymo dataset confirm the effectiveness of our approach. The proposed method consistently provides enhanced performance across various difficulty levels compared to the original framework while maintaining real-time efficiency.

Published in Cyborg and Bionic Systems

ISSN: 2692-7632 (Online)
Publisher: American Association for the Advancement of Science (AAAS)
Country of publisher: United States
LCC subjects: Science: Science (General): Cybernetics
Website: https://spj.sciencemag.org/journals/cbsystems/

About the journal