A novel multi‐model 3D object detection framework with adaptive voxel‐image feature fusion

Zhao Liu; Zhongliang Fu; Gang Li; Shengyuan Zhang

doi:10.1049/cvi2.12269

IET Computer Vision (Aug 2024)

A novel multi‐model 3D object detection framework with adaptive voxel‐image feature fusion

Zhao Liu,
Zhongliang Fu,
Gang Li,
Shengyuan Zhang

Affiliations

Zhao Liu: School of Remote Sensing and Information Engineering Wuhan University Wuhan Hubei China
Zhongliang Fu: School of Remote Sensing and Information Engineering Wuhan University Wuhan Hubei China
Gang Li: School of Remote Sensing and Information Engineering Wuhan University Wuhan Hubei China
Shengyuan Zhang: School of Remote Sensing and Information Engineering Wuhan University Wuhan Hubei China

DOI: https://doi.org/10.1049/cvi2.12269
Journal volume & issue: Vol. 18, no. 5
pp. 640 – 651

Abstract

Read online

Abstract The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge of effectively combining the complementary properties of multi‐sensor data looms large. This work presents a new approach to multi‐model 3D object detection, called adaptive voxel‐image feature fusion (AVIFF). Adaptive voxel‐image feature fusion is an end‐to‐end single‐shot framework that can dynamically and adaptively fuse point cloud and image features, resulting in a more comprehensive and integrated analysis of the camera sensor and the LiDar sensor data. With the aid of the adaptive feature fusion module, spatialised image features can be adroitly fused with voxel‐based point cloud features, while the Dense Fusion module ensures the preservation of the distinctive characteristics of 3D point cloud data through the use of a heterogeneous architecture. Notably, the authors’ framework features a novel generalised intersection over union loss function that enhances the perceptibility of object localsation and rotation in 3D space. Comprehensive experimentation has validated the efficacy of the authors’ proposed modules, firmly establishing AVIFF as a novel framework in the field of 3D object detection.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords