IEEE Access (Jan 2024)
MonoMPV: Monocular 3D Object Detection With Multiple Projection Views on Edge Devices
Abstract
In the field of autonomous driving, monocular 3D object detection is focused on the task of representing 3D scenes using a single camera image and conducting 3D object detection. While Bird’s-Eye View (BEV) method effectively decreases the computational burden associated with 3D scene representation, its limitation in considering height information can lead to a less accurate depiction of complex 3D structures. This study introduces an innovative monocular 3D object detection framework called MonoMPV. This framework represents a complete 3D scene by mapping spatial objects onto Multi-Projection Views (MPV) without the need for voxelization, thus simplifying the process. Notably, MPV systems consist of Feature Cross-Attention (FCA) and Projection Cross-Attention (PCA) components. FCA aims to enhance image features to MPV level, while PCA enables direct information interaction among the views within MPV. Furthermore, Triplet Loss for Top Feature (TLTF) was employed in conjunction with FCA and PCA to distinguish effectively between top-plane and background features. By engaging in this practice, it is possible to develop more complex 3D structural models and establish precise optimization objectives through TLTF. Consequently, this approach enhances the effective utilization of data by the model. Experimental results on the nuScenes dataset illustrate that this approach surpasses existing monocular 3D object detection methods. To implement algorithms on on-board edge computing devices, the monocular 3D object detection task has been executed on the edge device Jetson Orin NX, ensuring high precision.
Keywords