IEEE Access (Jan 2020)

Keyphrase Generation With CopyNet and Semantic Web

  • Xun Zhu,
  • Chen Lyu,
  • Donghong Ji

DOI
https://doi.org/10.1109/ACCESS.2020.2977508
Journal volume & issue
Vol. 8
pp. 44202 – 44210

Abstract

Read online

Keyphrases provide core information for users to understand the document. Most previous works utilize machine learning based methods for keyphrases extraction and achieve promising performance. However, these methods focus on identify keyphrases from the input text, and can not extract keyphrases that do not appear in the text. In this paper, we present an encoder-decoder framework, which incorporating copying mechanism, to generate keyphrases for the given text. This framework (CopyNet) integrates the generation part and copying part. The generation part generates the keyphrase from the predefined vocabulary, and the copy part gets the keyphrases from the source text. Furthermore, we improve the CopyNet by using different probability of the two parts. To incorporate more related information for keyphrase generation, the automatically built keyphrase semantic web is merged into the dataset to participate in the training process of the neural network. Semantic similarity based and word co-occurrence based methods are used for keyphrase semantic web construction. We build a large-scale biomedical keyphrase dataset to evaluate the system performance. Experiments show that our improved CopyNet can achieve better performance with different portions of the generation and copying part, and the incorporation of the semantic web also effectively improves the keyphrase generation.

Keywords