Rapid Unsupervised Keyphrase Extraction from Single Document

Svetlana Popova; Vera Danilova; John Cardiff

doi:10.23919/FRUCT64283.2024.10749871

Proceedings of the XXth Conference of Open Innovations Association FRUCT (Nov 2024)

Rapid Unsupervised Keyphrase Extraction from Single Document

Svetlana Popova,
Vera Danilova,
John Cardiff

Affiliations

Svetlana Popova: TUD Dublin
Vera Danilova: Uppsala University
John Cardiff: TUD

DOI: https://doi.org/10.23919/FRUCT64283.2024.10749871
Journal volume & issue: Vol. 36, no. 1
pp. 609 – 616

Abstract

Read online

Keyphrases offer a concise representation of a document’s content. They are valuable for improving web search results and enhancing tasks such as document tagging, text classification, or summarization. This makes keyphrase extraction is an essential component of text mining. Among the widely used constraints and features in existing keyphrase extraction methods, we identified several effective techniques that have not yet been used together: Part-of-Speech (PoS) restrictions, extended stop-word lists, and position-based features. To address this gap, we propose an approach that leverages automatically extracted extended stop word lists combined with PoS restrictions in keyphrases, and incorporates positional criteria. The main goal of the work was to develop a fast keyphrase extraction algorithm, which was built upon the three mentioned features. Experimental results on the INSPEC and SemEval 2010 datasets demonstrate the effectiveness of the proposed method.

Published in Proceedings of the XXth Conference of Open Innovations Association FRUCT

ISSN: 2305-7254 (Print); 2343-0737 (Online)
Publisher: FRUCT
Country of publisher: Finland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: http://fruct.org/publication

About the journal

Abstract

Keywords