IEEE Access (Jan 2019)

DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation

  • Jianzhong Yuan,
  • Wujie Zhou,
  • Ting Luo

DOI
https://doi.org/10.1109/ACCESS.2019.2955101
Journal volume & issue
Vol. 7
pp. 169350 – 169358

Abstract

Read online

Indoor scene segmentation is a difficult task in computer vision. We propose an indoor scene segmentation framework, called DFMNet, incorporating RGB and complementary depth information to establish indoor scene segmentation. We use the squeeze-and-excitation residual network as encoder to simultaneously extract features from RGB and depth data and fuse them in the decoder. Multiple average pooling layers and transposed convolution layers are used to process the encoded outputs and fuse their outputs over several decoder layers. To optimize the network parameters, we use a pyramid supervision training scheme, which applies supervised learning over different layers in the decoder to prevent vanishing gradients. We evaluated the proposed DFMNet on the NYU Depth V2 dataset, which consists of 1449 cluttered indoor scenes, achieving competitive results compared to state-of-the-art methods.

Keywords