Nexus (Jun 2025)

Generalized domain prompt learning for accessible scientific vision-language models

  • Qinglong Cao,
  • Yuntian Chen,
  • Lu Lu,
  • Hao Sun,
  • Zhengzhong Zeng,
  • Xiaokang Yang,
  • Dongxiao Zhang

DOI
https://doi.org/10.1016/j.ynexs.2025.100069
Journal volume & issue
Vol. 2, no. 2
p. 100069

Abstract

Read online

Large-scale vision-language models have shown remarkable success in general vision tasks, inspiring the development of domain-specific models. However, creating these models requires substantial investments in annotated data, computational resources, and energy, making such endeavors largely accessible to industrial giants. This resource-intensive nature of model development inadvertently stifles academic research, particularly for smaller research groups, and limits the diversity and growth of the field. To address these challenges and promote more sustainable and equitable research, we introduce the generalized domain prompt learning framework. This framework enables the efficient transfer of robust recognition capabilities from natural vision to specialized domains without the need for large datasets or heavy computational overhead. By leveraging small-scale domain-specific foundation models and minimal prompt samples, the framework enriches the language component with domain-specific knowledge through quaternion networks, revealing cross-modal relationships between specialized vision features and natural vision-based contextual embeddings. Simultaneously, it drives the vision component toward domain-specific tasks via hierarchical propagation of vision prompts, grounded in well-matched vision-language relationships. To maximize the domain adaptation potential of these models, we also propose a novel low-rank adaptation technique. Extensive experiments across diverse domains—including remote sensing, medical imaging, geology, synthetic aperture radar, and fluid dynamics—demonstrate the effectiveness of the framework, achieving state-of-the-art domain recognition performance within a prompt learning structure. Our work presents a pathway for inclusive and sustainable research that bridges the gap between academia and industry, empowering smaller research groups and promoting broader access to cutting-edge advancements in the context of future sustainable development. Broader context: This research tackles a critical challenge in vision-language model (VLM) development: the disparity in resources between academia and industry. While large-scale VLMs have transformed natural vision tasks, their use in specialized domains like remote sensing and medical imagery remains constrained due to resource limitations. By introducing generalized domain prompt learning (GDPL), the study enables smaller research groups to adapt powerful VLMs for specialized domains like remote sensing, medical imaging, and geology, fostering inclusivity and innovation. The framework integrates insights from machine learning, natural language processing, and domain-specific expertise, highlighting the interdisciplinary nature of the work.Long-term, GDPL aspires to democratize AI resources, bridging gaps between academia and industry. Its applications could revolutionize fields like environmental monitoring, healthcare diagnostics, and energy exploration by improving model accessibility and adaptability. This research promotes AI equity, empowering diverse research groups to tackle real-world problems, thus advancing societal benefits such as resource sustainability, improved health outcomes, and scientific collaboration.

Keywords