IEEE Access (Jan 2024)

Semantic Shape and Trajectory Reconstruction for Monocular Cooperative 3D Object Detection

  • Marton Cserni,
  • Andras Rovid

DOI
https://doi.org/10.1109/ACCESS.2024.3484672
Journal volume & issue
Vol. 12
pp. 167153 – 167167

Abstract

Read online

Currently the state-of-the-art monocular 3D object detectors use machine learning to estimate the 6DOF pose and shape of vehicles. This requires large amounts of precisely annotated 3D data for the training process and significant computing power for inference. Alternatively, there exist methods, which attempt to reconstruct target vehicle shapes and scales using projective geometry and classically detected feature points such as SURF and ORB. These methods use specific camera motion or geometrical constraints which cannot always be assumed. The resulting model is an unstructured point cloud which contains no semantic information, making its utility inconvenient in a distributed perception system. In this study, the applicability of semantic keypoints for vehicle shape and trajectory estimation is explored. A novel method is presented, which is capable reconstructing the semantic shape and trajectory of the target vehicle from a sequence of images with state-of-the art accuracy. The resulting semantic vertex model is then used for monocular, single frame 6DOF pose estimation with high accuracy. Building on this, a cooperative perception framework is also introduced. The algorithm is tested in both in-vehicle and infrastructure mounted mono-camera sensor setups. In addition to achieving state of the art depth accuracy in vehicle trajectory reconstruction on the Argoverse dataset, our method outperforms the state of the art shape-aware deep learning method in pose estimation in a cooperative perception scenario both in simulation and in real-world experiments.

Keywords