Jisuanji kexue (Dec 2022)

Visual Question Answering Method Based on Counterfactual Thinking

  • YUAN De-sen, LIU Xiu-jing, WU Qing-bo, LI Hong-liang, MENG Fan-man, NGAN King-ngi, XU Lin-feng

DOI
https://doi.org/10.11896/jsjkx.220600038
Journal volume & issue
Vol. 49, no. 12
pp. 229 – 235

Abstract

Read online

Visual question answering(VQA) is a multi-modal task that combines computer vision and natural language proces-sing,which is extremely challenging.However,the current VQA model is often misled by the apparent correlation in the data,and the output of the model is directly guided by language bias.Many previous researches focus on solving language bias and assisting the model via counterfactual sample methods.These studies,however,ignore the prediction information and the difference between key features and non-key features in counterfactual samples.The proposed model can distinguish the difference between the original sample,the factual sample and the counterfactual sample.In view of this,this paper proposes a paradigm of contrastive learning based on counterfactual samples.By comparing these three samples in terms of feature gaps and prediction gaps,the VQA model has been significantly improved in its robustness.Compared with CL-VQA method,the overall precision,average precision and Num index of this method improves by 0.19%,0.89% and 2.6% respectively.Compared with the CSSVQA method,the Gap of the proposed method decrease to 0.45 from 0.96.

Keywords