A Near-Real-Time Answer Discovery for Open-Domain With Unanswerable Questions From the Web

Mintae Kim; Sangheon Lee; Yeongtaek Oh; Hyunseung Choi; Wooju Kim

doi:10.1109/ACCESS.2020.3020245

IEEE Access (Jan 2020)

A Near-Real-Time Answer Discovery for Open-Domain With Unanswerable Questions From the Web

Mintae Kim,
Sangheon Lee,
Yeongtaek Oh,
Hyunseung Choi,
Wooju Kim

Affiliations

Mintae Kim: ORCiD; Department of Industrial Engineering, Yonsei University, Seoul, South Korea
Sangheon Lee: Department of Industrial Engineering, Yonsei University, Seoul, South Korea
Yeongtaek Oh: Department of Industrial Engineering, Yonsei University, Seoul, South Korea
Hyunseung Choi: ORCiD; Department of Industrial Engineering, Yonsei University, Seoul, South Korea
Wooju Kim: ORCiD; Department of Industrial Engineering, Yonsei University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2020.3020245
Journal volume & issue: Vol. 8
pp. 158346 – 158355

Abstract

Read online

With the proliferation of question and answering (Q&A) services, studies on building a knowledge base (KB) using various information extraction (IE) methodologies from unstructured data on the Web have received significant attention. Existing IE approaches, including machine reading comprehension (MRC), can find the correct answer to a question if the correct answer exists in the document. However, most are prone to extracting incorrect answers rather than producing no answers when the correct answer does not exist in the given documents. This problem is likely to cause serious real-world problems when we apply such technologies to practical services such as AI speakers. We propose a novel open-domain IE system to alleviate the weaknesses of previous approaches. The proposed system integrates an elaborated document selection, sentence selection, and knowledge extraction ensemble method to obtain high specificity while maintaining a realistically achievable level of precision. Based on this framework, we extract answers on Korean open-domain user queries from unstructured documents collected from multiple Web sources. For evaluating our system, we build a benchmark dataset with the SKTelecom AI Speaker log. The baseline models KYLIN infobox generator and BiDAF were used to evaluate the performance of the proposed approach. The experimental results demonstrate that the proposed method outperforms the baseline models and is practically applicable to real-world services.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords