IEEE Access (Jan 2024)
Subjective Scoring Framework for VQA Models in Autonomous Driving
Abstract
The development of vision and language transformer models has paved the way for Visual Question Answering (VQA) models and related research. There are metrics to assess the general accuracy of VQA models but subjective assessment of the answers generated by the models is necessary to gain an in-depth understanding and a framework for subjective assessment is required. This work develops a novel scoring system based on the subjectivity of the question and analyses the answers provided by the model using multiple types of natural language processing models (bert-base-uncased, nli-distilBERT-base, all-mpnet-base-v2 and GPT-2) and sentence similarity benchmark metrics (Cosine Similarity). A case study detailing the use of the proposed subjective scoring framework on three prominent VQA models- ViLT, ViLBERT, and LXMERT using an automotive dataset is also presented. The framework proposed aids in analyzing the shortcomings of the discussed VQA models from a driving perspective and the results achieved help determine which model would work best when fine-tuned on a driving-specific VQA dataset.
Keywords