IEEE Access (Jan 2020)
QA4IE: A Question Answering Based System for Document-Level General Information Extraction
Abstract
Information Extraction (IE) is the task of distilling structured information from unstructured texts by identifying references to named entities as well as relationships between such entities. Existing IE solutions, including Relation Extraction and Open IE, can hardly take cross-sentence information like coreferences into account and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation triples). In order to overcome the weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering approaches to produce high-quality relation triples across sentences. Based on this framework, we develop a real-time IE system, which can perform general IE throughout the entire document. For training and evaluating our system, we build a large-scale IE benchmark using distant supervision under human evaluation. We deploy both component analyses and pipeline experiments to evaluate our system. The results show that our system can generalize on unseen entities and relations, as well as achieve significant improvements over existing IE systems.
Keywords