IEEE Access (Jan 2024)
G-Fusion: LiDAR and Camera Feature Fusion on the Ground Voxel Space
Abstract
Detection based on LiDAR and camera fusion is increasingly popular for researchers in the autonomous driving domain. Compared to the camera-only and LiDAR-only methods, the fusion-based methods indeed improve the detection accuracy on public-available datasets. However, due to the complexity of the projection or fusion mechanism, few of these methods can run in real time even on an advanced desktop GPU. Thus, in this paper, we propose a new fusion detection model G-Fusion with a light and fast image view-transform module. According to our receptive field analysis of image feature maps, we directly project image features to only one voxel layer located on the ground, then fuse the LiDAR and image features by concatenation and convolution. With this delicately designed module, G-Fusion greatly boosts the state-of-the-art speed performance on the nuScenes dataset, achieving a good balance with the competitive detection scores. Meanwhile, since the precision of sensor extrinsic parameters is important for most fusion-based methods, we also deeply dig into our model’s calibration error tolerance ability and discover the failure noise condition.
Keywords