Complexity (Jan 2022)
Keyword Extraction for Medium-Sized Documents Using Corpus-Based Contextual Semantic Smoothing
Abstract
Keyword extraction refers to the process of selecting most significant, relevant, and descriptive terms as keywords, which are present inside a single document. Keyword extraction has major applications in the information retrieval domain, such as analysis, summarization, indexing, and search, of documents. In this paper, we present a novel supervised technique for extraction of keywords from medium-sized documents, namely Corpus-based Contextual Semantic Smoothing (CCSS). CCSS extends the concept of Contextual Semantic Smoothing (CSS), which considers term usage patterns in similar texts to improve term relevance information. We introduce four more features beyond CSS as our novel contributions in this work. We systematically compare the performance of CCSS with other techniques, when implemented over INSPEC dataset, where CCSS outperforms all state-of-the-art keyphrase extraction techniques presented in the literature.