IEEE Access (Jan 2024)
A Novel Open-Domain Question Answering System on Curated and Extracted Knowledge Bases With Consideration of Confidence Scores in Existing Triples
Abstract
In this paper, we present a novel approach to improving Open-Domain Knowledge Base Question Answering by leveraging both curated (Freebase) and extracted (Reverb) knowledge bases. Our hybrid system combines Span Detection using BERT (SD-BERT) for precise entity and relation span detection with a Term Frequency-Inverse Document Frequency (TF-IDF) retrieval model enhanced by function scoring, achieving a balance between efficiency and accuracy. Additionally, we explore using Contextual Late Interaction over BERT (ColBERTv2), a scalable retrieval model optimized for token-level interaction, to handle more complex queries involving multiple entities and relations. Our system demonstrates significant improvements in handling large-scale, hybrid datasets such as ReverbSimpleQuestions and SimpleQuestions, providing both high accuracy and scalability. Through extensive evaluation, we achieved a Hit@1 of 67.87% using SD-BERT + TF-IDF with confidence scoring, outperforming several benchmark systems in fact-based query retrieval while maintaining real-time performance and the best performance, 83.63%, was achieved on the extracted knowledge base using the ColBERTv2 method.
Keywords