IEEE Access (Jan 2019)

Document Specific Supervised Keyphrase Extraction With Strong Semantic Relations

  • Huiting Liu,
  • Lili Wang,
  • Peng Zhao,
  • Xindong Wu

DOI
https://doi.org/10.1109/ACCESS.2019.2948891
Journal volume & issue
Vol. 7
pp. 167507 – 167520

Abstract

Read online

Keyphrase extraction is the task of automatically extracting descriptive phrases or concepts that represent the main topics in a document. Finding good keyphrases in a document can quickly summarize knowledge for information retrieval and decision making. Existing keyphrase extraction methods cannot be customized to each specific document, and cannot capture flexible semantic relations. In this paper, a keyphrase extraction algorithm using maximum sequential pattern mining with one-off and general gaps condition, called Ke-MSMING, is presented. Ke_MSMING first searches all keyphrase candidates from a document using sequential patterns mining and the topic model, and then adopts supervised machine learning to classify each keyphrase candidate as a keyphrase or not. Finally, Ke_MSMING selects top-N keyphrases as the final keyphrases. Ke_MSMING not only uses baseline features and pattern features but also uses centrality features obtained from the cooccurrence semantic network, and the cooccurrence networks can yield powerful semantic relations for keyphrase extraction. Experimental results on two datasets demonstrate that Ke_MSMING has better performance than other state-of-the-art keyphrase extraction approaches.

Keywords