Jisuanji kexue (Jan 2023)

Knowledge-based Visual Question Answering:A Survey

  • WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping

DOI
https://doi.org/10.11896/jsjkx.211100237
Journal volume & issue
Vol. 50, no. 1
pp. 166 – 175

Abstract

Read online

As an important presentation form of the completeness of artificial intelligence and the visual Turing test,visual question answering(VQA),coupled with its potential application value,has received extensive attention from computer vision and na-tural language processing.Knowledge plays an important role in visual question answering,especially when dealing with complex and open questions,reasoning knowledge and external knowledge are critical to obtaining correct answers.The question and answer mechanism that contains knowledge is called knowledge-based visual question answering(Kb-VQA).At present,systematic investigations on Kb-VQA have not been discovered.Research on knowledge participation methods and expression forms in VQA can effectively fill the gaps in the literature review in the knowledge-based visual question answering system.In this paper,the constituent units of Kb-VQA are investigated,the existence of knowledge is studied,and the concept of knowledge hierarchy is proposed.Further,the knowledge participation methods and expression forms in the process of visual feature extraction,language feature extraction and multi-modal fusion are summarized,and future development trends and research directions are discussed.

Keywords