Fusing feature consistency across views for multi-view stereo

Rong Zhao; Caiqin Jia; Xindong Guo; Xiaowen Yang; Huiyan Han; Xie Han

doi:10.1007/s40747-025-01930-9

Complex & Intelligent Systems (Jun 2025)

Fusing feature consistency across views for multi-view stereo

Rong Zhao,
Caiqin Jia,
Xindong Guo,
Xiaowen Yang,
Huiyan Han,
Xie Han

Affiliations

Rong Zhao: Shanxi Key Laboratory of Machine Vision and Virtual Reality, North University of China
Caiqin Jia: Shanxi Key Laboratory of Machine Vision and Virtual Reality, North University of China
Xindong Guo: Shanxi Key Laboratory of Machine Vision and Virtual Reality, North University of China
Xiaowen Yang: Shanxi Key Laboratory of Machine Vision and Virtual Reality, North University of China
Huiyan Han: Shanxi Key Laboratory of Machine Vision and Virtual Reality, North University of China
Xie Han: Shanxi Key Laboratory of Machine Vision and Virtual Reality, North University of China

DOI: https://doi.org/10.1007/s40747-025-01930-9
Journal volume & issue: Vol. 11, no. 7
pp. 1 – 16

Abstract

Read online

Abstract Dense feature matching is of significant importance in the learning-based multi-view stereo (MVS) pipeline. Feature consistency across views is a key factor affecting feature matching and plays a crucial role in the reconstruction of objects by the learning-based MVS method. This factor has not been adequately considered in previous studies. To address this issue, we first introduce the color invariance module, which derives a set of object color properties that are independent of illumination and viewpoint from the physics-based reflection model and the RGB-based Gaussian color model. This module highlights consistent feature representations across views. Meanwhile, it also facilitates the learning of feature consistency as prior knowledge. We then propose two pixel-wise feature losses. These losses further encourage and supervise the image feature extractor to learn consistent features for pixels with the same meaning in multiple views. By focusing on feature consistency across views, we enable the network to perceive similar visual representations among multiple views and boost the performance of the MVS task. To demonstrate the rationality and effectiveness of these strategies for the learning-based MVS, we conduct experiments on the DTU and Tanks & Temples datasets, achieving better reconstruction completeness. Compared to other state-of-the-art methods, our method also shows better generalization ability on the ETH3D dataset.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords