IEEE Access (Jan 2024)
Object-Aware Semantic Scene Completion Through Attention-Based Feature Fusion and Voxel-Points Representation
Abstract
Semantic scene completion is a computer vision technique that combines semantic segmentation and shape completion. Its purpose is to infer a complete 3D scene with semantic information from single-view RGB-D images. In recent years, some methods have adopted the voxel-points-based approach, converting voxelized scenes into point clouds to reduce the computational cost associated with 3D convolutions. However, majority of such methods do not fully consider the geometric details of the objects in the scene. In this paper, we propose ASPNet (Attention-based Semantic Point Completion Network), a two-branch semantic scene completion algorithm that combines scene-level completion and object refinement. In the scene level completion branch, we design the SPT (Semantic-based Point Transformer) module, which introduces semantic information into the traditional Point Transformer layer to realize the feature aggregation of neighboring keypoints of the same category. Using the object detection module and the object refinement module, ASPNet refines the rough semantic complementation results obtained from direct coding and decoding of RGB-D inputs. The quantitative results show that ASPNet has much less computational overhead than the 3D convolution-based semantic scene completion algorithm, while the reconstruction results have more geometric details.
Keywords