TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation

Weikuan Jia; Xingchao Yan; Qiaolian Liu; Ting Zhang; Xishang Dong

doi:10.1007/s40747-023-01210-4

Complex & Intelligent Systems (Aug 2023)

TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation

Weikuan Jia,
Xingchao Yan,
Qiaolian Liu,
Ting Zhang,
Xishang Dong

Affiliations

Weikuan Jia: School of Information Science and Engineering, Zaozhuang University
Xingchao Yan: School of Information Science and Engineering, Shandong Normal University
Qiaolian Liu: School of Information Science and Engineering, Zaozhuang University
Ting Zhang: School of Information Science and Engineering, Zaozhuang University
Xishang Dong: School of Information Science and Engineering, Zaozhuang University

DOI: https://doi.org/10.1007/s40747-023-01210-4
Journal volume & issue: Vol. 10, no. 1
pp. 1219 – 1230

Abstract

Read online

Abstract Semantic segmentation plays a vital role in indoor scene analysis. Currently, its accuracy is still limited due to the complex conditions of various indoor scenes. In addition, it is difficult to complete this task solely relying on RGB images. Since depth images can provide additional 3D geometric information to RGB images, researchers chose to incorporate depth images for improving the accuracy of indoor semantic segmentation. However, it is still a challenge to effectively fuse the depth information with the RGB images. To address this issue, a three-stream coordinate attention network is proposed. The presented network reconstructs a multi-modal feature fusion module for RGB-D features, which can realize the aggregation of two modal information along the spatial and channel dimensions. Meanwhile, three convolutional neural network branches are used to construct a parallel three-stream structure, which can, respectively, process the RGB features, depth features and combined features. On one hand, the proposed network can preserve the original RGB and depth feature streams, simultaneously. On the other hand, it can also contribute to utilize and propagate the fusion feature flow better. The embedded ASPP module is used to optimize the semantic information in the proposed network, so as to aggregate the feature information of different scales and obtain more accurate features. Experimental results show that the proposed model can reach a state-of-the-art mIoU accuracy of 50.2% on the NYUDv2 dataset and on the more complex SUN-RGBD dataset.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords