Applied Sciences (Dec 2018)
Issue-Based Clustering of Scholarly Articles
Abstract
A scholarly article often discusses multiple research issues. The clustering of scholarly articles based on research issues can facilitate analyses of related articles on specific issues in scientific literature. It is a task of overlapping clustering, as an article may discuss multiple issues, and hence, be clustered into multiple clusters. Clustering is challenging, as it is difficult to identify the research issues with which to cluster the articles. In this paper, we propose the use of the titles of the references cited by the articles to tackle the challenge, based on the hypothesis that such information may indicate the research issues discussed in the article. A technique referred to as ICRT (Issue-based Clustering with Reference Titles) was thus developed. ICRT works as a post-processor for various clustering systems. In experiments on those articles that domain experts have selected to annotate research issues about specific entity associations, ICRT works with various clustering systems that employ state-of-the-art similarity measures for scholarly articles. ICRT successfully improves these systems by identifying clusters of articles with the same research focuses on specific entity associations. The contribution is of technical and practical significance to the exploration of research issues reported in scientific literature (supporting the curation of entity associations found in the literature).
Keywords