Proceedings of the XXth Conference of Open Innovations Association FRUCT (Nov 2024)
Evaluating Vision-Language Models for hematology image Classification: Performance Analysis of CLIP and its Biomedical AI Variants
Abstract
Vision-language models (VLMs) have shown remarkable potential in various domains, particularly in zero-shot learning applications. This research focuses on evaluating the performance of notable VLMs—CLIP, PLIP, and BiomedCLIP—in the classification of blood cells, with a specific emphasis on distinguishing between normal and malignant (cancerous) cells datasets. While CLIP demonstrates robust zero-shot capabilities in general tasks, this study probes its biomedical adaptations, PLIP and BiomedCLIP, to assess their effectiveness in specialized medical tasks, such as hematological image classification. Additionally, we investigate the impact of prompt engineering on model performance, exploring how variations in prompt construction influence accuracy across these biomedical datasets. Extensive experiments were conducted on a variety of biomedical images, including microscopic blood cell images, brain MRIs, and chest X-rays, providing a comprehensive evaluation of the VLMs' applicability in medical imaging. Our findings reveal that while CLIP, trained on general datasets, performs well in broader contexts, PLIP and BiomedCLIP—optimized for medical imagery—demonstrate enhanced accuracy in medical settings, particularly in hematology. The results underscore the strengths and limitations of these models, offering valuable insights into their adaptability, precision, and potential for future applications in medical image classification.
Keywords