FusionRCNN: LiDAR-Camera Fusion for Two-Stage 3D Object Detection

Xinli Xu; Shaocong Dong; Tingfa Xu; Lihe Ding; Jie Wang; Peng Jiang; Liqiang Song; Jianan Li

doi:10.3390/rs15071839

Remote Sensing (Mar 2023)

FusionRCNN: LiDAR-Camera Fusion for Two-Stage 3D Object Detection

Xinli Xu,
Shaocong Dong,
Tingfa Xu,
Lihe Ding,
Jie Wang,
Peng Jiang,
Liqiang Song,
Jianan Li

Affiliations

Xinli Xu: Image Engineering & Video Technology Lab, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
Shaocong Dong: Image Engineering & Video Technology Lab, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
Tingfa Xu: Image Engineering & Video Technology Lab, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
Lihe Ding: Image Engineering & Video Technology Lab, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
Jie Wang: Image Engineering & Video Technology Lab, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
Peng Jiang: National Astronomical Observatories of China, Beijing 100107, China
Liqiang Song: National Astronomical Observatories of China, Beijing 100107, China
Jianan Li: Image Engineering & Video Technology Lab, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China

DOI: https://doi.org/10.3390/rs15071839
Journal volume & issue: Vol. 15, no. 7
p. 1839

Abstract

Read online

Accurate and reliable perception systems are essential for autonomous driving and robotics. To achieve this, 3D object detection with multi-sensors is necessary. Existing 3D detectors have significantly improved accuracy by adopting a two-stage paradigm that relies solely on LiDAR point clouds for 3D proposal refinement. However, the sparsity of point clouds, particularly for faraway points, makes it difficult for the LiDAR-only refinement module to recognize and locate objects accurately. To address this issue, we propose a novel multi-modality two-stage approach called FusionRCNN. This approach effectively and efficiently fuses point clouds and camera images in the Regions of Interest (RoI). The FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from the camera in a unified attention mechanism. Specifically, FusionRCNN first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step. Then, it leverages an intra-modality self-attention to enhance the domain-specific features, followed by a well-designed cross-attention to fuse the information from two modalities. FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors. Remarkably, FusionRCNN improves the strong SECOND baseline by 6.14% mAP on Waymo and outperforms competing two-stage approaches.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords