STATE: Learning structure and texture representations for novel view synthesis

Xinyi Jing; Qiao Feng; Yu-Kun Lai; Jinsong Zhang; Yuanqiang Yu; Kun Li

doi:10.1007/s41095-022-0301-9

Computational Visual Media (Jul 2023)

STATE: Learning structure and texture representations for novel view synthesis

Xinyi Jing,
Qiao Feng,
Yu-Kun Lai,
Jinsong Zhang,
Yuanqiang Yu,
Kun Li

Affiliations

Xinyi Jing: College of Intelligence and Computing, Tianjin University
Qiao Feng: College of Intelligence and Computing, Tianjin University
Yu-Kun Lai: School of Computer Science and Informatics, Cardiff University
Jinsong Zhang: College of Intelligence and Computing, Tianjin University
Yuanqiang Yu: College of Intelligence and Computing, Tianjin University
Kun Li: College of Intelligence and Computing, Tianjin University

DOI: https://doi.org/10.1007/s41095-022-0301-9
Journal volume & issue: Vol. 9, no. 4
pp. 767 – 786

Abstract

Read online

Abstract Novel viewpoint image synthesis is very challenging, especially from sparse views, due to large changes in viewpoint and occlusion. Existing image-based methods fail to generate reasonable results for invisible regions, while geometry-based methods have difficulties in synthesizing detailed textures. In this paper, we propose STATE, an end-to-end deep neural network, for sparse view synthesis by learning structure and texture representations. Structure is encoded as a hybrid feature field to predict reasonable structures for invisible regions while maintaining original structures for visible regions, and texture is encoded as a deformed feature map to preserve detailed textures. We propose a hierarchical fusion scheme with intra-branch and inter-branch aggregation, in which spatio-view attention allows multi-view fusion at the feature level to adaptively select important information by regressing pixel-wise or voxel-wise confidence maps. By decoding the aggregated features, STATE is able to generate realistic images with reasonable structures and detailed textures. Experimental results demonstrate that our method achieves qualitatively and quantitatively better results than state-of-the-art methods. Our method also enables texture and structure editing applications benefiting from implicit disentanglement of structure and texture. Our code is available at http://cic.tju.edu.cn/faculty/likun/projects/STATE .

Published in Computational Visual Media

ISSN: 2096-0433 (Print); 2096-0662 (Online)
Publisher: SpringerOpen
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.springer.com/41095

About the journal

Abstract

Keywords