IEEE Access (Jan 2019)

Progressive Joint Framework for Chinese Question Entity Discovery and Linking With Question Representations

  • Ziqi Lin,
  • Haidong Zhang,
  • Wancheng Ni,
  • Yiping Yang

DOI
https://doi.org/10.1109/ACCESS.2019.2944223
Journal volume & issue
Vol. 7
pp. 146282 – 146300

Abstract

Read online

Chinese question entity discovery and linking (QEDL) may encounter short texts and small-scale annotated datasets, which may invalidate certain machine learning algorithms. In this paper, we propose a progressive joint framework for Chinese QEDL, which leverages the mutual dependency information of these two tasks to enhance the performance with each other. The framework uses the candidate entity generation (CEG) of entity linking to iteratively augment the overall process of entity discovery that consists of mention generation, filtering and merging modules. In mention generation module, to reduce the hand-crafted effort of the rule-based entity discovery, we develop a question representation method to generate domain-independent entity discovery rules, and use CEG to check the extracted mentions in priority order. This module can embed extracted mentions into other entity discovery methods as one feature or as extra mentions to alleviate insufficiencies of annotated datasets. The mentions filtering module leverages the joint features of extracted mentions and CEG's entities to build a voting model and filter out low-confidence mentions. Moreover, the mentions merging module merges different patterns' mention-entity pairs and check their corresponding candidate entities with CEG. During entity linking, we incorporate the joint features of questions, extracted mentions and CEG's entities into a ranking model for entity disambiguation. Finally, we conduct experiments on two real datasets and compare our approach with other state-of-the-art methods. The results illustrate that the proposed framework can reduce error accumulation and flexibly combine different entity discovery methods, which significantly improves the performance on small-scale datasets.

Keywords