Document Specific Supervised Keyphrase Extraction With Strong Semantic Relations

Huiting Liu; Lili Wang; Peng Zhao; Xindong Wu

doi:10.1109/ACCESS.2019.2948891

IEEE Access (Jan 2019)

Document Specific Supervised Keyphrase Extraction With Strong Semantic Relations

Huiting Liu,
Lili Wang,
Peng Zhao,
Xindong Wu

Affiliations

Huiting Liu: ORCiD; Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education, Anhui University, Hefei, China
Lili Wang: Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education, Anhui University, Hefei, China
Peng Zhao: Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education, Anhui University, Hefei, China
Xindong Wu: Research Institute of Big Knowledge, Hefei University of Technology, Hefei, China

DOI: https://doi.org/10.1109/ACCESS.2019.2948891
Journal volume & issue: Vol. 7
pp. 167507 – 167520

Abstract

Read online

Keyphrase extraction is the task of automatically extracting descriptive phrases or concepts that represent the main topics in a document. Finding good keyphrases in a document can quickly summarize knowledge for information retrieval and decision making. Existing keyphrase extraction methods cannot be customized to each specific document, and cannot capture flexible semantic relations. In this paper, a keyphrase extraction algorithm using maximum sequential pattern mining with one-off and general gaps condition, called Ke-MSMING, is presented. Ke_MSMING first searches all keyphrase candidates from a document using sequential patterns mining and the topic model, and then adopts supervised machine learning to classify each keyphrase candidate as a keyphrase or not. Finally, Ke_MSMING selects top-N keyphrases as the final keyphrases. Ke_MSMING not only uses baseline features and pattern features but also uses centrality features obtained from the cooccurrence semantic network, and the cooccurrence networks can yield powerful semantic relations for keyphrase extraction. Experimental results on two datasets demonstrate that Ke_MSMING has better performance than other state-of-the-art keyphrase extraction approaches.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords