Frontiers in Environmental Science (Jul 2024)
Efficient greenhouse segmentation with visual foundation models: achieving more with fewer samples
Abstract
Introduction: The Vision Transformer (ViT) model, which leverages self-supervised learning, has shown exceptional performance in natural image segmentation, suggesting its extensive potential in visual tasks. However, its effectiveness diminishes in remote sensing due to the varying perspectives of remote sensing images and unique optical properties of features like the translucency of greenhouses. Additionally, the high cost of training Visual Foundation Models (VFMs) from scratch for specific scenes limits their deployment.Methods: This study investigates the feasibility of rapidly deploying VFMs on new tasks by using embedding vectors generated by VFMs as prior knowledge to enhance traditional segmentation models’ performance. We implemented this approach to improve the accuracy and robustness of segmentation with the same number of trainable parameters. Comparative experiments were conducted to evaluate the efficiency and effectiveness of this method, especially in the context of greenhouse detection and management.Results: Our findings indicate that the use of embedding vectors facilitates rapid convergence and significantly boosts segmentation accuracy and robustness. Notably, our method achieves or exceeds the performance of traditional segmentation models using only about 40% of the annotated samples. This reduction in the reliance on manual annotation has significant implications for remote sensing applications.Discussion: The application of VFMs in remote sensing tasks, particularly for greenhouse detection and management, demonstrated enhanced segmentation accuracy and reduced dependence on annotated samples. This method adapts more swiftly to different lighting conditions, enabling more precise monitoring of agricultural resources. Our study underscores the potential of VFMs in remote sensing tasks and opens new avenues for the expansive application of these models in diverse downstream tasks.
Keywords