SIANet: 3D object detection with structural information augment network

Jing Zhou; Tengxing Lin; Zixin Gong; Xinhan Huang

doi:10.1049/cvi2.12272

IET Computer Vision (Aug 2024)

SIANet: 3D object detection with structural information augment network

Jing Zhou,
Tengxing Lin,
Zixin Gong,
Xinhan Huang

Affiliations

Jing Zhou: School of Artificial Intelligence Jianghan University Wuhan China
Tengxing Lin: School of Artificial Intelligence Jianghan University Wuhan China
Zixin Gong: School of Artificial Intelligence Jianghan University Wuhan China
Xinhan Huang: School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China

DOI: https://doi.org/10.1049/cvi2.12272
Journal volume & issue: Vol. 18, no. 5
pp. 682 – 695

Abstract

Read online

Abstract 3D object detection technology from point clouds has been widely applied in the field of automatic driving in recent years. In practical applications, the shape point clouds of some objects are incomplete due to occlusion or far distance, which means they suffer from insufficient structural information. This greatly affects the detection performance. To address this challenge, the authors design a Structural Information Augment (SIA) Network for 3D object detection, named SIANet. Specifically, the authors design a SIA module to reconstruct the complete shapes of objects within proposals for enhancing their geometric features, which are further fused into the spatial feature of the object for box refinement to predict accurate detection boxes. Besides, the authors construct a novel Unet‐liked Context‐enhanced Transformer backbone network, which stacks Context‐enhanced Transformer modules and an upsampling branch to capture contextual information efficiently and generate high‐quality proposals for the SIA module. Extensive experiments show that the authors’ well‐designed SIANet can effectively improve detection performance, especially surpassing the baseline network by 1.04% mean Average Precision (mAP) gain in the KITTI dataset and 0.75% LEVEL_2 mAP gain in the Waymo dataset.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords