IEEE Access (Jan 2025)

Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection

  • Chao Jie Zuo,
  • Cao Yu Gu,
  • Yi Kun Guo,
  • Xiao Dong Miao

DOI
https://doi.org/10.1109/ACCESS.2024.3518564
Journal volume & issue
Vol. 13
pp. 10447 – 10458

Abstract

Read online

Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. Due to inherent differences between different modalities, seeking an efficient and accurate fusion method is of great importance. Recently, significant progress has been made in 3D object detection methods based on lift-splat-shot (LSS-based) approaches. However, inaccurate depth estimation and substantial semantic information loss remain significant factors limiting the accuracy of 3D detection. In this paper, we propose a cross-fusion framework under a dual spatial representation, by integrating information in different spatial representations, namely bird’s-eye view (BEV) and camera view, and establishing soft links to fully utilize the information carried by different modalities. It consists of two important components, gated LiDAR supervised BEV (GLS-BEV) and multi-attention cross fusion (MACF) modules. The former achieves accurate depth estimation by supervising the transformation of LiDAR data with clear depth into the image space, constructing point cloud features in vehicle’s perspective. The latter utilizes three sub-attention modules with different roles to achieve cross-modal interaction within the same space, effectively reducing semantic loss. On the nuScenes benchmark, our proposed method achieves outstanding 3D object detection results with 71.8 mAP and 74.2 NDS. The code is available at https://github.com/zcj223311/CSDSFusion.

Keywords