IET Image Processing (Apr 2022)

Visual question answering with gated relation‐aware auxiliary

  • Xiangjun Shao,
  • Zhenglong Xiang,
  • Yuanxiang Li

DOI
https://doi.org/10.1049/ipr2.12421
Journal volume & issue
Vol. 16, no. 5
pp. 1424 – 1432

Abstract

Read online

Abstract The great advances in computer vision and natural language processing make significant progress in visual question answering. In the visual question answering task, the visual representation is essential for understanding the image content. However, traditional methods rarely exploit the context information of the visual feature related to the question and the relation‐aware information to capture valuable visual representation. Therefore, a gated relation‐aware model is proposed to capture the enhanced visual representation for desiring answer prediction. The gated relation‐aware module can learn relation‐aware information between the visual feature and the context, and a certain object of an image, respectively. In addition, the proposed module can filter out the unnecessary relation‐aware information through the gate guided by the question semantic representation. The results of the conducted experiments show that the gated relation‐aware module makes a significant improvement on all answer categories.