Vision-Language Models for Zero-Shot Classification of Remote Sensing Images

Mohamad Mahmoud Al  Rahhal; Yakoub Bazi; Hebah Elgibreen; Mansour Zuair

doi:10.3390/app132212462

Applied Sciences (Nov 2023)

Vision-Language Models for Zero-Shot Classification of Remote Sensing Images

Mohamad Mahmoud Al Rahhal,
Yakoub Bazi,
Hebah Elgibreen,
Mansour Zuair

Affiliations

Mohamad Mahmoud Al Rahhal: Applied Computer Science Department, College of Applied Computer Science, King Saud University, Riyadh 11543, Saudi Arabia
Yakoub Bazi: Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
Hebah Elgibreen: Information Technology Department, College of Computer and Information Sciences, Riyadh 11543, Saudi Arabia
Mansour Zuair: Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

DOI: https://doi.org/10.3390/app132212462
Journal volume & issue: Vol. 13, no. 22
p. 12462

Abstract

Read online

Zero-shot classification presents a challenge since it necessitates a model to categorize images belonging to classes it has not encountered during its training phase. Previous research in the field of remote sensing (RS) has explored this task by training image-based models on known RS classes and then attempting to predict the outcomes for unfamiliar classes. Despite these endeavors, the outcomes have proven to be less than satisfactory. In this paper, we propose an alternative approach that leverages vision-language models (VLMs), which have undergone pre-training to grasp the associations between general computer vision image-text pairs in diverse datasets. Specifically, our investigation focuses on thirteen VLMs derived from Contrastive Language-Image Pre-Training (CLIP/Open-CLIP) with varying levels of parameter complexity. In our experiments, we ascertain the most suitable prompt for RS images to query the language capabilities of the VLM. Furthermore, we demonstrate that the accuracy of zero-shot classification, particularly when using large CLIP models, on three widely recognized RS scene datasets yields superior results compared to existing RS solutions.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords