IEEE Access (Jan 2020)

A Near-Real-Time Answer Discovery for Open-Domain With Unanswerable Questions From the Web

  • Mintae Kim,
  • Sangheon Lee,
  • Yeongtaek Oh,
  • Hyunseung Choi,
  • Wooju Kim

DOI
https://doi.org/10.1109/ACCESS.2020.3020245
Journal volume & issue
Vol. 8
pp. 158346 – 158355

Abstract

Read online

With the proliferation of question and answering (Q&A) services, studies on building a knowledge base (KB) using various information extraction (IE) methodologies from unstructured data on the Web have received significant attention. Existing IE approaches, including machine reading comprehension (MRC), can find the correct answer to a question if the correct answer exists in the document. However, most are prone to extracting incorrect answers rather than producing no answers when the correct answer does not exist in the given documents. This problem is likely to cause serious real-world problems when we apply such technologies to practical services such as AI speakers. We propose a novel open-domain IE system to alleviate the weaknesses of previous approaches. The proposed system integrates an elaborated document selection, sentence selection, and knowledge extraction ensemble method to obtain high specificity while maintaining a realistically achievable level of precision. Based on this framework, we extract answers on Korean open-domain user queries from unstructured documents collected from multiple Web sources. For evaluating our system, we build a benchmark dataset with the SKTelecom AI Speaker log. The baseline models KYLIN infobox generator and BiDAF were used to evaluate the performance of the proposed approach. The experimental results demonstrate that the proposed method outperforms the baseline models and is practically applicable to real-world services.

Keywords