A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images

Yue Shan; Jun Xiao; Lupeng Liu; Yunbiao Wang; Dongbo Yu; Wenniu Zhang

doi:10.3390/rs16050901

Remote Sensing (Mar 2024)

A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images

Yue Shan,
Jun Xiao,
Lupeng Liu,
Yunbiao Wang,
Dongbo Yu,
Wenniu Zhang

Affiliations

Yue Shan: School of Artificial Intelligence, University of Chinese Academy and Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China
Jun Xiao: School of Artificial Intelligence, University of Chinese Academy and Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China
Lupeng Liu: School of Artificial Intelligence, University of Chinese Academy and Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China
Yunbiao Wang: School of Artificial Intelligence, University of Chinese Academy and Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China
Dongbo Yu: School of Artificial Intelligence, University of Chinese Academy and Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China
Wenniu Zhang: School of Artificial Intelligence, University of Chinese Academy and Sciences, No. 19 Yuquan Road, Shijingshan District, Beijing 100049, China

DOI: https://doi.org/10.3390/rs16050901
Journal volume & issue: Vol. 16, no. 5
p. 901

Abstract

Read online

Reconstructing 3D structures from non-overlapping multi-view images is a crucial task in the field of 3D computer vision, since it is difficult to establish feature correspondences and infer depth from overlapping parts of views. Previous methods, whether generating the surface mesh or volume of an object, face challenges in simultaneously ensuring the accuracy of detailed topology and the integrity of the overall structure. In this paper, we introduce a novel coarse-to-fine Transformer-based reconstruction network to generate precise point clouds from multiple input images at sparse and non-overlapping viewpoints. Specifically, we firstly employ a general point cloud generation architecture enhanced by the concept of adaptive centroid constraint for the coarse point cloud corresponding to the object. Subsequently, a Transformer-based refinement module applies deformation to each point. We design an attention-based encoder to encode both image projection features and point cloud geometric features, along with a decoder to calculate deformation residuals. Experiments on ShapeNet demonstrate that our proposed method outperforms other competing methods.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords