Overall Understanding of Indoor Scenes by Fusing Multiframe Local RGB-D Data Based on Conditional Random Fields

Haotian Chen; Longfei Su; Biao Zhang; Fengchi Sun; Jing Yuan; Jie Liu

doi:10.1109/ACCESS.2020.2985227

IEEE Access (Jan 2020)

Overall Understanding of Indoor Scenes by Fusing Multiframe Local RGB-D Data Based on Conditional Random Fields

Haotian Chen,
Longfei Su,
Biao Zhang,
Fengchi Sun,
Jing Yuan,
Jie Liu

Affiliations

Haotian Chen: ORCiD; College of Computer Science, Nankai University, Tianjin, China
Longfei Su: ORCiD; College of Software, Nankai University, Tianjin, China
Biao Zhang: ORCiD; College of Computer Science, Nankai University, Tianjin, China
Fengchi Sun: ORCiD; College of Software, Nankai University, Tianjin, China
Jing Yuan: College of Artificial Intelligence, Nankai University, Tianjin, China
Jie Liu: College of Artificial Intelligence, Nankai University, Tianjin, China

DOI: https://doi.org/10.1109/ACCESS.2020.2985227
Journal volume & issue: Vol. 8
pp. 65035 – 65045

Abstract

Read online

Indoor mobile robots normally cannot capture the whole information of a scene by a single frame of perceptive data due to the limited sensor scope. The category of the current scene may be misjudged by robotics due to incomplete scene information, which leads to operation error. To address this problem, we propose an approach that leverages conditional random fields (CRFs) to fuse multiframe RGB and depth (RGB-D) visual data corresponding to the same scene. This method takes full advantage of prior knowledge that object categories significantly relate to the scene attributes. As a new image arrives, we incrementally integrate the current object detection results to update scene understanding by identifying duplicate objects between images, ranking available objects in terms of their relevance to the scene, and fusing new information with the existing CRF. With this approach, scene classification can be solved with higher precision based on multiview images than on single image frames sampled in the same places. Additionally, a configuration map of a scene is incrementally built into the above framework. The map includes identities of the recognized objects and various relations between them. This kind of map would not only benefit common robotic tasks but also offer a novel channel for human-robot interaction. We test the efficiency of our method on image sequences extracted from the NYU v2 dataset. The results show that our approach achieves the best performance against state-of-the-art baselines.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords