Proceedings of the XXth Conference of Open Innovations Association FRUCT (Nov 2013)

Stop-words in keyphrase extraction problem

  • S. Popova,
  • L. Kovriguina,
  • D. Mouromtsev,
  • I. Khodyrev

DOI
https://doi.org/10.1109/FRUCT.2013.6737953
Journal volume & issue
Vol. 232, no. 14
pp. 113 – 121

Abstract

Read online

Keyword extraction problem is one of the most significant tasks in information retrieval. High-quality keyword extraction sufficiently influences the progress in the following subtasks of information retrieval: classification and clustering, data mining, knowledge extraction and representation, etc. The research environment has specified a layout for keyphrase extraction. However, some of the possible decisions remain uninvolved in the paradigm. In the paper the authors observe the scope of interdisciplinary methods applicable to automatic stop list feeding. The chosen method belongs to the class of experiential models. The research procedure based on this method allows to improve the quality of keyphrase extraction on the stage of candidate keyphrase building. Several ways to automatic feeding of the stop lists are proposed in the paper as well. One of them is based on provisions of lexical statistics and the results of its application to the discussed task point out the non-gaussian nature of text corpora. The second way based on usage of the Inspec train collection to the feeding of stop lists improves the quality considerably.

Keywords