Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences?

S. V. Popova; I. A. Khodyrev

doi:10.15514/ISPRAS-2014-26(4)-10

Труды Института системного программирования РАН (Oct 2018)

Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences?

S. V. Popova,
I. A. Khodyrev

Affiliations

S. V. Popova: Санкт-Петербургский Государственный Университет; Университет ИТМО
I. A. Khodyrev: Университет ИТМО

DOI: https://doi.org/10.15514/ISPRAS-2014-26(4)-10
Journal volume & issue: Vol. 26, no. 4
pp. 123 – 136

Abstract

Read online

The paper deals with keyphrase extraction problem for single documents, e.g. scientific abstracts. Keyphrase extraction task is important and its results could be used in a variety of applications: data indexing, clustering and classification of documents, meta-information extraction, automatic ontologies creation etc. In the paper we discuss an approach to keyphrase extraction, itsтАЩ first step is building of candidate phrases which are then ranked and the best are selected as keyphrases. The paper is focused on the evaluation of weighting approaches to candidate phrases in the unsupervised ex-traction methods. A number of in-phrase word weighting procedures is evaluated. Unsuitable approaches to weighting are identified. Testing of some approaches shows their equivalence as applied to keyphrase extraction. A feature, which allows to increase the quality of extracted keyphrases and shows better results in comparison to the state of the art, is proposed. Experiments are based on Inspec dataset.

Published in Труды Института системного программирования РАН

ISSN: 2079-8156 (Print); 2220-6426 (Online)
Publisher: Ivannikov Institute for System Programming of the Russian Academy of Sciences
Country of publisher: Russian Federation
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://ispranproceedings.elpub.ru/jour/index

About the journal

Abstract

Keywords