MonoMPV: Monocular 3D Object Detection With Multiple Projection Views on Edge Devices

Zhaoxue Deng; Bingsen Hao; Guofang Liu; Xingquan Li; Hanbing Wei; Fei Huang; Shengshu Liu

doi:10.1109/access.2024.3458412

IEEE Access (Jan 2024)

MonoMPV: Monocular 3D Object Detection With Multiple Projection Views on Edge Devices

Zhaoxue Deng,
Bingsen Hao,
Guofang Liu,
Xingquan Li,
Hanbing Wei,
Fei Huang,
Shengshu Liu

Affiliations

Zhaoxue Deng: School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China
Bingsen Hao: ORCiD; School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China
Guofang Liu: China Society of Automotive Engineers, Beijing, China
Xingquan Li: Research and Development Department, Chongqing Changan Automobile Company Limited., Chongqing, China
Hanbing Wei: School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing, China
Fei Huang: China Road and Bridge Corporation, Beijing, China
Shengshu Liu: China Road and Bridge Corporation, Beijing, China

DOI: https://doi.org/10.1109/access.2024.3458412
Journal volume & issue: Vol. 12
pp. 136599 – 136612

Abstract

Read online

In the field of autonomous driving, monocular 3D object detection is focused on the task of representing 3D scenes using a single camera image and conducting 3D object detection. While Bird’s-Eye View (BEV) method effectively decreases the computational burden associated with 3D scene representation, its limitation in considering height information can lead to a less accurate depiction of complex 3D structures. This study introduces an innovative monocular 3D object detection framework called MonoMPV. This framework represents a complete 3D scene by mapping spatial objects onto Multi-Projection Views (MPV) without the need for voxelization, thus simplifying the process. Notably, MPV systems consist of Feature Cross-Attention (FCA) and Projection Cross-Attention (PCA) components. FCA aims to enhance image features to MPV level, while PCA enables direct information interaction among the views within MPV. Furthermore, Triplet Loss for Top Feature (TLTF) was employed in conjunction with FCA and PCA to distinguish effectively between top-plane and background features. By engaging in this practice, it is possible to develop more complex 3D structural models and establish precise optimization objectives through TLTF. Consequently, this approach enhances the effective utilization of data by the model. Experimental results on the nuScenes dataset illustrate that this approach surpasses existing monocular 3D object detection methods. To implement algorithms on on-board edge computing devices, the monocular 3D object detection task has been executed on the edge device Jetson Orin NX, ensuring high precision.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords