IEEE Access (Jan 2023)
Counterfactual Mix-Up for Visual Question Answering
Abstract
Counterfactuals have been shown to be a powerful method in Visual Question Answering in the alleviation of Visual Question Answering’s unimodal bias. However, existing counterfactual methods tend to generate samples that are not diverse or require auxiliary models to synthesize additional data. In this regard, we propose a more diverse and simple counterfactual sample synthesis method called Counterfactual Mix-Up (CoMiU), which generates counterfactual image features and questions through batch-wise swapping in local object- and word-level. This method efficiently facilitates the generation of more abundant and diverse counterfactual samples, which help improve the robustness of Visual Question Answering models. Moreover, with the creation of diverse counterfactual samples, we introduce two more robust and stable contrastive loss functions, namely Batch-Contrastive loss and Answer-Contrastive loss. We test our method on various challenging Visual Question Answering robustness testing setups to show the advantages of the proposed method compared with the current state-of-the-art methods.
Keywords