Deep LiDAR-Radar-Visual Fusion for Object Detection in Urban Environments

Yuhan Xiao; Yufei Liu; Kai Luan; Yuwei Cheng; Xieyuanli Chen; Huimin Lu

doi:10.3390/rs15184433

Remote Sensing (Sep 2023)

Deep LiDAR-Radar-Visual Fusion for Object Detection in Urban Environments

Yuhan Xiao,
Yufei Liu,
Kai Luan,
Yuwei Cheng,
Xieyuanli Chen,
Huimin Lu

Affiliations

Yuhan Xiao: College of Intelligence Science and Technology, National University of Defense Technology, No. 137 Yanwachi Street, Changsha 410073, China
Yufei Liu: College of Intelligence Science and Technology, National University of Defense Technology, No. 137 Yanwachi Street, Changsha 410073, China
Kai Luan: College of Intelligence Science and Technology, National University of Defense Technology, No. 137 Yanwachi Street, Changsha 410073, China
Yuwei Cheng: Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Xieyuanli Chen: College of Intelligence Science and Technology, National University of Defense Technology, No. 137 Yanwachi Street, Changsha 410073, China
Huimin Lu: College of Intelligence Science and Technology, National University of Defense Technology, No. 137 Yanwachi Street, Changsha 410073, China

DOI: https://doi.org/10.3390/rs15184433
Journal volume & issue: Vol. 15, no. 18
p. 4433

Abstract

Read online

Robust environmental sensing and accurate object detection are crucial in enabling autonomous driving in urban environments. To achieve this goal, autonomous mobile systems commonly integrate multiple sensor modalities onboard, aiming to enhance accuracy and robustness. In this article, we focus on achieving accurate 2D object detection in urban autonomous driving scenarios. Considering the occlusion issues of using a single sensor from a single viewpoint, as well as the limitations of current vision-based approaches in bad weather conditions, we propose a novel multi-modal sensor fusion network called LRVFNet. This network effectively combines data from LiDAR, mmWave radar, and visual sensors through a deep multi-scale attention-based architecture. LRVFNet comprises three modules: a backbone responsible for generating distinct features from various sensor modalities, a feature fusion module utilizing the attention mechanism to fuse multi-modal features, and a pyramid module for object reasoning at different scales. By effectively fusing complementary information from multi-modal sensory data, LRVFNet enhances accuracy and robustness in 2D object detection. Extensive evaluations have been conducted on the public VOD dataset and the Flow dataset. The experimental results demonstrate the superior performance of our proposed LRVFNet compared to state-of-the-art baseline methods.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords