Object-Aware Semantic Scene Completion Through Attention-Based Feature Fusion and Voxel-Points Representation

Yubin Miao; Junkang Wan; Junjie Luo; Hang Wu; Ruochong Fu

doi:10.1109/ACCESS.2024.3370844

IEEE Access (Jan 2024)

Object-Aware Semantic Scene Completion Through Attention-Based Feature Fusion and Voxel-Points Representation

Yubin Miao,
Junkang Wan,
Junjie Luo,
Hang Wu,
Ruochong Fu

Affiliations

Yubin Miao: ORCiD; School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
Junkang Wan: ORCiD; School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
Junjie Luo: School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
Hang Wu: School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
Ruochong Fu: School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China

DOI: https://doi.org/10.1109/ACCESS.2024.3370844
Journal volume & issue: Vol. 12
pp. 31431 – 31442

Abstract

Read online

Semantic scene completion is a computer vision technique that combines semantic segmentation and shape completion. Its purpose is to infer a complete 3D scene with semantic information from single-view RGB-D images. In recent years, some methods have adopted the voxel-points-based approach, converting voxelized scenes into point clouds to reduce the computational cost associated with 3D convolutions. However, majority of such methods do not fully consider the geometric details of the objects in the scene. In this paper, we propose ASPNet (Attention-based Semantic Point Completion Network), a two-branch semantic scene completion algorithm that combines scene-level completion and object refinement. In the scene level completion branch, we design the SPT (Semantic-based Point Transformer) module, which introduces semantic information into the traditional Point Transformer layer to realize the feature aggregation of neighboring keypoints of the same category. Using the object detection module and the object refinement module, ASPNet refines the rough semantic complementation results obtained from direct coding and decoding of RGB-D inputs. The quantitative results show that ASPNet has much less computational overhead than the 3D convolution-based semantic scene completion algorithm, while the reconstruction results have more geometric details.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords