Semi-Supervised Implicit Augmentation for Data-Scarce VQA

Bhargav Dodla; Kartik Hegde; A. N. Rajagopalan

doi:10.3390/cmsf2024009003

Computer Sciences & Mathematics Forum (Feb 2024)

Semi-Supervised Implicit Augmentation for Data-Scarce VQA

Bhargav Dodla,
Kartik Hegde,
A. N. Rajagopalan

Affiliations

Bhargav Dodla: Indian Institute of Technology, Madras 600036, India
Kartik Hegde: Indian Institute of Technology, Madras 600036, India
A. N. Rajagopalan: Indian Institute of Technology, Madras 600036, India

DOI: https://doi.org/10.3390/cmsf2024009003
Journal volume & issue: Vol. 9, no. 1
p. 3

Abstract

Read online

Vision-language models (VLMs) have demonstrated increasing potency in solving complex vision-language tasks in the recent past. Visual question answering (VQA) is one of the primary downstream tasks for assessing the capability of VLMs, as it helps in gauging the multimodal understanding of a VLM in answering open-ended questions. The vast contextual information learned during the pretraining stage in VLMs can be utilised effectively to finetune the VQA model for specific datasets. In particular, special types of VQA datasets, such as OK-VQA, A-OKVQA (outside knowledge-based), and ArtVQA (domain-specific), have a relatively smaller number of images and corresponding question-answer annotations in the training set. Such datasets can be categorised as data-scarce. This hinders the effective learning of VLMs due to the low information availability. We introduce SemIAug (Semi-Supervised Implicit Augmentation), a model and dataset agnostic strategy specially designed to address the challenges faced by limited data availability in the domain-specific VQA datasets. SemIAug uses the annotated image-question data present within the chosen dataset and augments it with meaningful new image-question associations. We show that SemIAug improves the VQA performance on data-scarce datasets without the need for additional data or labels.

Published in Computer Sciences & Mathematics Forum

ISSN: 2813-0324 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/csmf

About the journal

Abstract

Keywords