Virtual Reality & Intelligent Hardware (Oct 2019)

Real-time human segmentation by BowtieNet and a SLAM-based human AR system

  • Xiaomei Zhao,
  • Fulin Tang,
  • Yihong Wu

Journal volume & issue
Vol. 1, no. 5
pp. 511 – 524

Abstract

Read online

Background: Generally, it is difficult to obtain accurate pose and depth for a non-rigid moving object from a single RGB camera to create augmented reality (AR). In this study, we build an augmented reality system from a single RGB camera for a non-rigid moving human by accurately computing pose and depth, for which two key tasks are segmentation and monocular Simultaneous Localization and Mapping (SLAM). Most existing monocular SLAM systems are designed for static scenes, while in this AR system, the human body is always moving and non-rigid. Methods: In order to make the SLAM system suitable for a moving human, we first segment the rigid part of the human in each frame. A segmented moving body part can be regarded as a static object, and the relative motions between each moving body part and the camera can be considered the motion of the camera. Typical SLAM systems designed for static scenes can then be applied. In the segmentation step of this AR system, we first employ the proposed BowtieNet, which adds the atrous spatial pyramid pooling (ASPP) of DeepLab between the encoder and decoder of SegNet to segment the human in the original frame, and then we use color information to extract the face from the segmented human area. Results: Based on the human segmentation results and a monocular SLAM, this system can change the video background and add a virtual object to humans. Conclusions: The experiments on the human image segmentation datasets show that BowtieNet obtains state-of-the-art human image segmentation performance and enough speed for real-time segmentation. The experiments on videos show that the proposed AR system can robustly add a virtual object to humans and can accurately change the video background. Keywords: Augmented reality, Moving object, Reconstruction and tracking, Camera pose, Human segmentation