Masked Generative Light Field Prompting for Pixel-Level Structure Segmentations

Mianzhao Wang; Fan Shi; Xu Cheng; Shengyong Chen

doi:10.34133/research.0328

Research (Jan 2024)

Masked Generative Light Field Prompting for Pixel-Level Structure Segmentations

Mianzhao Wang,
Fan Shi,
Xu Cheng,
Shengyong Chen

Affiliations

Mianzhao Wang: The Engineering Research Center of Learning-Based Intelligent System (Ministry of Education), Tianjin University of Technology, Tianjin 300384, China.
Fan Shi: The Engineering Research Center of Learning-Based Intelligent System (Ministry of Education), Tianjin University of Technology, Tianjin 300384, China.
Xu Cheng: The Engineering Research Center of Learning-Based Intelligent System (Ministry of Education), Tianjin University of Technology, Tianjin 300384, China.
Shengyong Chen: The Engineering Research Center of Learning-Based Intelligent System (Ministry of Education), Tianjin University of Technology, Tianjin 300384, China.

DOI: https://doi.org/10.34133/research.0328
Journal volume & issue: Vol. 7

Abstract

Read online

Pixel-level structure segmentations have attracted considerable attention, playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine vision. However, current light field modeling methods fail to integrate appearance and geometric structural information into a coherent semantic space, thereby limiting the capability of light field transmission for visual knowledge. In this paper, we propose a general light field modeling method for pixel-level structure segmentation, comprising a generative light field prompting encoder (LF-GPE) and a prompt-based masked light field pretraining (LF-PMP) network. Our LF-GPE, serving as a light field backbone, can extract both appearance and geometric structural cues simultaneously. It aligns these features into a unified visual space, facilitating semantic interaction. Meanwhile, our LF-PMP, during the pretraining phase, integrates a mixed light field and a multi-view light field reconstruction. It prioritizes considering the geometric structural properties of the light field, enabling the light field backbone to accumulate a wealth of prior knowledge. We evaluate our pretrained LF-GPE on two downstream tasks: light field salient object detection and semantic segmentation. Experimental results demonstrate that LF-GPE can effectively learn high-quality light field features and achieve highly competitive performance in pixel-level segmentation tasks.

Published in Research

ISSN: 2096-5168 (Print); 2639-5274 (Online)
Publisher: American Association for the Advancement of Science (AAAS)
Country of publisher: United States
LCC subjects: Science
Website: https://spj.science.org/journal/research

About the journal