IET Computer Vision (Aug 2024)
A novel multi‐model 3D object detection framework with adaptive voxel‐image feature fusion
Abstract
Abstract The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge of effectively combining the complementary properties of multi‐sensor data looms large. This work presents a new approach to multi‐model 3D object detection, called adaptive voxel‐image feature fusion (AVIFF). Adaptive voxel‐image feature fusion is an end‐to‐end single‐shot framework that can dynamically and adaptively fuse point cloud and image features, resulting in a more comprehensive and integrated analysis of the camera sensor and the LiDar sensor data. With the aid of the adaptive feature fusion module, spatialised image features can be adroitly fused with voxel‐based point cloud features, while the Dense Fusion module ensures the preservation of the distinctive characteristics of 3D point cloud data through the use of a heterogeneous architecture. Notably, the authors’ framework features a novel generalised intersection over union loss function that enhances the perceptibility of object localsation and rotation in 3D space. Comprehensive experimentation has validated the efficacy of the authors’ proposed modules, firmly establishing AVIFF as a novel framework in the field of 3D object detection.
Keywords