Key Concept Identification: A Comprehensive Analysis of Frequency and Topical Graph-Based Approaches

Muhammad Aman; Abas bin Md Said; Said Jadid Abdul Kadir; Israr Ullah

doi:10.3390/info9050128

Information (May 2018)

Key Concept Identification: A Comprehensive Analysis of Frequency and Topical Graph-Based Approaches

Muhammad Aman,
Abas bin Md Said,
Said Jadid Abdul Kadir,
Israr Ullah

Affiliations

Muhammad Aman: Department of Computer and Information Sciences, Universiti Teknologi Petronas, Seri Iskander 32610, Malaysia
Abas bin Md Said: Department of Computer and Information Sciences, Universiti Teknologi Petronas, Seri Iskander 32610, Malaysia
Said Jadid Abdul Kadir: Department of Computer and Information Sciences, Universiti Teknologi Petronas, Seri Iskander 32610, Malaysia
Israr Ullah: Department of Computer Engineering, Jeju National University, Jeju 63243, Korea

DOI: https://doi.org/10.3390/info9050128
Journal volume & issue: Vol. 9, no. 5
p. 128

Abstract

Read online

Automatic key concept extraction from text is the main challenging task in information extraction, information retrieval and digital libraries, ontology learning, and text analysis. The statistical frequency and topical graph-based ranking are the two kinds of potentially powerful and leading unsupervised approaches in this area, devised to address the problem. To utilize the potential of these approaches and improve key concept identification, a comprehensive performance analysis of these approaches on datasets from different domains is needed. The objective of the study presented in this paper is to perform a comprehensive empirical analysis of selected frequency and topical graph-based algorithms for key concept extraction on three different datasets, to identify the major sources of error in these approaches. For experimental analysis, we have selected TF-IDF, KP-Miner and TopicRank. Three major sources of error, i.e., frequency errors, syntactical errors and semantical errors, and the factors that contribute to these errors are identified. Analysis of the results reveals that performance of the selected approaches is significantly degraded by these errors. These findings can help us develop an intelligent solution for key concept extraction in the future.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords