Applied Sciences (Nov 2023)

Vision-Language Models for Zero-Shot Classification of Remote Sensing Images

  • Mohamad Mahmoud Al Rahhal,
  • Yakoub Bazi,
  • Hebah Elgibreen,
  • Mansour Zuair

DOI
https://doi.org/10.3390/app132212462
Journal volume & issue
Vol. 13, no. 22
p. 12462

Abstract

Read online

Zero-shot classification presents a challenge since it necessitates a model to categorize images belonging to classes it has not encountered during its training phase. Previous research in the field of remote sensing (RS) has explored this task by training image-based models on known RS classes and then attempting to predict the outcomes for unfamiliar classes. Despite these endeavors, the outcomes have proven to be less than satisfactory. In this paper, we propose an alternative approach that leverages vision-language models (VLMs), which have undergone pre-training to grasp the associations between general computer vision image-text pairs in diverse datasets. Specifically, our investigation focuses on thirteen VLMs derived from Contrastive Language-Image Pre-Training (CLIP/Open-CLIP) with varying levels of parameter complexity. In our experiments, we ascertain the most suitable prompt for RS images to query the language capabilities of the VLM. Furthermore, we demonstrate that the accuracy of zero-shot classification, particularly when using large CLIP models, on three widely recognized RS scene datasets yields superior results compared to existing RS solutions.

Keywords