IEEE Access (Jan 2024)

A Novel Open-Domain Question Answering System on Curated and Extracted Knowledge Bases With Consideration of Confidence Scores in Existing Triples

  • Somayyeh Behmanesh,
  • Alireza Talebpour,
  • Mehrnoush Shamsfard,
  • Mohammad Mahdi Jafari

DOI
https://doi.org/10.1109/ACCESS.2024.3490452
Journal volume & issue
Vol. 12
pp. 160741 – 160760

Abstract

Read online

In this paper, we present a novel approach to improving Open-Domain Knowledge Base Question Answering by leveraging both curated (Freebase) and extracted (Reverb) knowledge bases. Our hybrid system combines Span Detection using BERT (SD-BERT) for precise entity and relation span detection with a Term Frequency-Inverse Document Frequency (TF-IDF) retrieval model enhanced by function scoring, achieving a balance between efficiency and accuracy. Additionally, we explore using Contextual Late Interaction over BERT (ColBERTv2), a scalable retrieval model optimized for token-level interaction, to handle more complex queries involving multiple entities and relations. Our system demonstrates significant improvements in handling large-scale, hybrid datasets such as ReverbSimpleQuestions and SimpleQuestions, providing both high accuracy and scalability. Through extensive evaluation, we achieved a Hit@1 of 67.87% using SD-BERT + TF-IDF with confidence scoring, outperforming several benchmark systems in fact-based query retrieval while maintaining real-time performance and the best performance, 83.63%, was achieved on the extracted knowledge base using the ColBERTv2 method.

Keywords