A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module

Shuling Wang; Fengze Jiang; Xiaojin Gong

doi:10.3390/s24196270

Sensors (Sep 2024)

A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module

Shuling Wang,
Fengze Jiang,
Xiaojin Gong

Affiliations

Shuling Wang: The College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Fengze Jiang: The College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Xiaojin Gong: The College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China

DOI: https://doi.org/10.3390/s24196270
Journal volume & issue: Vol. 24, no. 19
p. 6270

Abstract

Read online

Depth information is crucial for perceiving three-dimensional scenes. However, depth maps captured directly by depth sensors are often incomplete and noisy, our objective in the depth-completion task is to generate dense and accurate depth maps from sparse depth inputs by fusing guidance information from corresponding color images obtained from camera sensors. To address these challenges, we introduce transformer models, which have shown great promise in the field of vision, into the task of image-guided depth completion. By leveraging the self-attention mechanism, we propose a novel network architecture that effectively meets these requirements of high accuracy and resolution in depth data. To be more specific, we design a dual-branch model with a transformer-based encoder that serializes image features into tokens step by step and extracts multi-scale pyramid features suitable for pixel-wise dense prediction tasks. Additionally, we incorporate a dual-attention fusion module to enhance the fusion between the two branches. This module combines convolution-based spatial and channel-attention mechanisms, which are adept at capturing local information, with cross-attention mechanisms that excel at capturing long-distance relationships. Our model achieves state-of-the-art performance on both the NYUv2 depth and SUN-RGBD depth datasets. Additionally, our ablation studies confirm the effectiveness of the designed modules.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords