Generalized domain prompt learning for accessible scientific vision-language models

Qinglong Cao; Yuntian Chen; Lu Lu; Hao Sun; Zhengzhong Zeng; Xiaokang Yang; Dongxiao Zhang

doi:10.1016/j.ynexs.2025.100069

Nexus (Jun 2025)

Generalized domain prompt learning for accessible scientific vision-language models

Qinglong Cao,
Yuntian Chen,
Lu Lu,
Hao Sun,
Zhengzhong Zeng,
Xiaokang Yang,
Dongxiao Zhang

Affiliations

Qinglong Cao: MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China; Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo 315200, China
Yuntian Chen: Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo 315200, China; Corresponding author
Lu Lu: Department of Statistics and Data Science, Yale University, New Haven, CT 06511, USA
Hao Sun: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China
Zhengzhong Zeng: School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
Xiaokang Yang: MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China
Dongxiao Zhang: Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo 315200, China; Institute for Advanced Study, Lingnan University, Tuen Mun, Hong Kong 100872, China

DOI: https://doi.org/10.1016/j.ynexs.2025.100069
Journal volume & issue: Vol. 2, no. 2
p. 100069

Abstract

Read online

Large-scale vision-language models have shown remarkable success in general vision tasks, inspiring the development of domain-specific models. However, creating these models requires substantial investments in annotated data, computational resources, and energy, making such endeavors largely accessible to industrial giants. This resource-intensive nature of model development inadvertently stifles academic research, particularly for smaller research groups, and limits the diversity and growth of the field. To address these challenges and promote more sustainable and equitable research, we introduce the generalized domain prompt learning framework. This framework enables the efficient transfer of robust recognition capabilities from natural vision to specialized domains without the need for large datasets or heavy computational overhead. By leveraging small-scale domain-specific foundation models and minimal prompt samples, the framework enriches the language component with domain-specific knowledge through quaternion networks, revealing cross-modal relationships between specialized vision features and natural vision-based contextual embeddings. Simultaneously, it drives the vision component toward domain-specific tasks via hierarchical propagation of vision prompts, grounded in well-matched vision-language relationships. To maximize the domain adaptation potential of these models, we also propose a novel low-rank adaptation technique. Extensive experiments across diverse domains—including remote sensing, medical imaging, geology, synthetic aperture radar, and fluid dynamics—demonstrate the effectiveness of the framework, achieving state-of-the-art domain recognition performance within a prompt learning structure. Our work presents a pathway for inclusive and sustainable research that bridges the gap between academia and industry, empowering smaller research groups and promoting broader access to cutting-edge advancements in the context of future sustainable development. Broader context: This research tackles a critical challenge in vision-language model (VLM) development: the disparity in resources between academia and industry. While large-scale VLMs have transformed natural vision tasks, their use in specialized domains like remote sensing and medical imagery remains constrained due to resource limitations. By introducing generalized domain prompt learning (GDPL), the study enables smaller research groups to adapt powerful VLMs for specialized domains like remote sensing, medical imaging, and geology, fostering inclusivity and innovation. The framework integrates insights from machine learning, natural language processing, and domain-specific expertise, highlighting the interdisciplinary nature of the work.Long-term, GDPL aspires to democratize AI resources, bridging gaps between academia and industry. Its applications could revolutionize fields like environmental monitoring, healthcare diagnostics, and energy exploration by improving model accessibility and adaptability. This research promotes AI equity, empowering diverse research groups to tackle real-world problems, thus advancing societal benefits such as resource sustainability, improved health outcomes, and scientific collaboration.

Published in Nexus

ISSN: 2950-1601 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science: Science (General); Technology
Website: https://www.cell.com/nexus/home

About the journal

Abstract

Keywords