Infrastructure 3D Target Detection Based on Multi-Mode Fusion for Intelligent and Connected Vehicles

Xiucai Zhang; Lei He; Rui Lv; Changcheng Jin; Yuhai Wang

doi:10.1109/ACCESS.2023.3292174

IEEE Access (Jan 2023)

Infrastructure 3D Target Detection Based on Multi-Mode Fusion for Intelligent and Connected Vehicles

Xiucai Zhang,
Lei He,
Rui Lv,
Changcheng Jin,
Yuhai Wang

Affiliations

Xiucai Zhang: State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun, China
Lei He: ORCiD; State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun, China
Rui Lv: State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun, China
Changcheng Jin: ORCiD; State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun, China
Yuhai Wang: ORCiD; State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun, China

DOI: https://doi.org/10.1109/ACCESS.2023.3292174
Journal volume & issue: Vol. 11
pp. 72803 – 72812

Abstract

Read online

Autonomous driving technology faces significant safety challenges due to the lack of a global perspective and the limitations of long-range perception capabilities. It is widely recognized that vehicle-infrastructure cooperation is essential to achieve Level 5 autonomy. Therefore, it is imperative to develop vehicle-road collaboration to enable accurate 3D target detection over a wide range and multiple targets infrastructure. In this paper, we propose using ResNet50+FPN as the backbone network and adding CoTNet and CBAM dual attention mechanisms to extract and encode four levels of image features. In point cloud feature extraction, We divide a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In multi-mode fusion, we propose a simple and effective multi-mode fusion method based on regional point fusion and regional voxel fusion for multi-mode fusion. Additionally, the VoxelNet architecture is utilized to combine image features and point cloud features. The proposed algorithm is evaluated on the DAIR-V2X dataset in 3D and BEV perspectives, and results show a significant improvement in the average precision (AP) of vehicles, pedestrians, and cyclists in 3D object detection on the infrastructure side, in a wide area with multiple objects, compared to existing 3D object detection algorithms.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords