Online Scene Semantic Understanding Based on Sparsely Correlated Network for AR

Qianqian Wang; Junhao Song; Chenxi Du; Chen Wang

doi:10.3390/s24144756

Sensors (Jul 2024)

Online Scene Semantic Understanding Based on Sparsely Correlated Network for AR

Qianqian Wang,
Junhao Song,
Chenxi Du,
Chen Wang

Affiliations

Qianqian Wang: The School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 102488, China
Junhao Song: The School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 102488, China
Chenxi Du: The School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 102488, China
Chen Wang: The School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 102488, China

DOI: https://doi.org/10.3390/s24144756
Journal volume & issue: Vol. 24, no. 14
p. 4756

Abstract

Read online

Real-world understanding serves as a medium that bridges the information world and the physical world, enabling the realization of virtual–real mapping and interaction. However, scene understanding based solely on 2D images faces problems such as a lack of geometric information and limited robustness against occlusion. The depth sensor brings new opportunities, but there are still challenges in fusing depth with geometric and semantic priors. To address these concerns, our method considers the repeatability of video stream data and the sparsity of newly generated data. We introduce a sparsely correlated network architecture (SCN) designed explicitly for online RGBD instance segmentation. Additionally, we leverage the power of object-level RGB-D SLAM systems, thereby transcending the limitations of conventional approaches that solely emphasize geometry or semantics. We establish correlation over time and leverage this correlation to develop rules and generate sparse data. We thoroughly evaluate the system’s performance on the NYU Depth V2 and ScanNet V2 datasets, demonstrating that incorporating frame-to-frame correlation leads to significantly improved accuracy and consistency in instance segmentation compared to existing state-of-the-art alternatives. Moreover, using sparse data reduces data complexity while ensuring the real-time requirement of 18 fps. Furthermore, by utilizing prior knowledge of object layout understanding, we showcase a promising application of augmented reality, showcasing its potential and practicality.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords