MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

HongWei Lai; ChunLong Ye; Zhenglin Li; Peng Yan; Yang Zhou

doi:10.1049/ipr2.13147

IET Image Processing (Sep 2024)

MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

HongWei Lai,
ChunLong Ye,
Zhenglin Li,
Peng Yan,
Yang Zhou

Affiliations

HongWei Lai: Institute of Artificial Intelligence Shanghai University Shanghai China
ChunLong Ye: Institute of Artificial Intelligence Shanghai University Shanghai China
Zhenglin Li: School of Future Technology Shanghai University Shanghai China
Peng Yan: School of Future Technology Shanghai University Shanghai China
Yang Zhou: Research Institute of USV Engineering School of Mechatronic Engineering and Automation, Shanghai University Shanghai China

DOI: https://doi.org/10.1049/ipr2.13147
Journal volume & issue: Vol. 18, no. 11
pp. 2962 – 2973

Abstract

Read online

Abstract Recent advancements in deep learning have significantly improved performance in the multi‐view stereo (MVS) domain, yet achieving a balance between reconstruction efficiency and quality remains challenging for learning‐based MVS methods. To address this, we introduce MFE‐MVSNet, designed for more effective and precise depth estimation. Our model incorporates a pyramid feature extraction network, featuring efficient multi‐scale attention and multi‐scale feature enhancement modules. These components capture pixel‐level pairwise relationships and semantic features with long‐range contextual information, enhancing feature representation. Additionally, we propose a lightweight 3D UNet regularization network based on depthwise separable convolutions to reduce computational costs. This network employs bi‐directional skip connections, establishing a fluid relationship between encoders and decoders and enabling cyclic reuse of building blocks without adding learnable parameters. By integrating these methods, MFE‐MVSNet effectively balances reconstruction quality and efficiency. Extensive qualitative and quantitative experiments on the DTU dataset validate our model's competitiveness, demonstrating approximately 33% and 12% relative improvements in overall score compared to MVSNet and CasMVSNet, respectively. Compared to other MVS networks, our approach more effectively balances reconstruction quality with efficiency.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords