DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation

Jianzhong Yuan; Wujie Zhou; Ting Luo

doi:10.1109/ACCESS.2019.2955101

IEEE Access (Jan 2019)

DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation

Jianzhong Yuan,
Wujie Zhou,
Ting Luo

Affiliations

Jianzhong Yuan: ORCiD; School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China
Wujie Zhou: ORCiD; School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China
Ting Luo: ORCiD; College of Science and Technology, Ningbo University, Ningbo, China

DOI: https://doi.org/10.1109/ACCESS.2019.2955101
Journal volume & issue: Vol. 7
pp. 169350 – 169358

Abstract

Read online

Indoor scene segmentation is a difficult task in computer vision. We propose an indoor scene segmentation framework, called DFMNet, incorporating RGB and complementary depth information to establish indoor scene segmentation. We use the squeeze-and-excitation residual network as encoder to simultaneously extract features from RGB and depth data and fuse them in the decoder. Multiple average pooling layers and transposed convolution layers are used to process the encoded outputs and fuse their outputs over several decoder layers. To optimize the network parameters, we use a pyramid supervision training scheme, which applies supervised learning over different layers in the decoder to prevent vanishing gradients. We evaluated the proposed DFMNet on the NYU Depth V2 dataset, which consists of 1449 cluttered indoor scenes, achieving competitive results compared to state-of-the-art methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords