IEEE Access (Jan 2020)
A Systematic Approach to Map the Research Articles’ Sections to IMRAD
Abstract
The amount of scientific publications is believed to get doubled every five-years. These publications are stored by citation indexes and digital libraries in the form of complete PDF or/and by extracting terms from these documents. This indexing behavior poses several challenges for the scientific community as well as for digital repositories in terms of handling the advanced requirements of a user. For instance, addressing queries like “Give me those papers that contain the term “Pagerank” in their result section” may not be answered unless the papers are indexed section-wise. This issue has been focused by researchers and international prestigious challenges by top venues in the world like Semantic Publishing Challenge in ESWC. One of the important metadata extraction from research papers is the section information such as IMRAD (Introduction, Methodology, Results, and Discussion). Researchers have presented different approaches to identify and map the section-headings to IMRAD sections. The existing studies have employed parameters like dictionary terms, the template of a paper, and in-text citation frequency to map section-headings onto logical sections. The critical analysis of state-of-the-art revealed that some immensely potential features have been ignored, which might result in accurate mapping. In this study, we propose a novel approach that employs new features along with previously well-known features to map sections-headings to IMRAD. The newly proposed features are: (1) variant of In-text Citation count (2) Figure counts, (3) Table counts, and (4) subheading implicit mapping. The employed data set contains 5000 research papers, collected from CiteSeer. The evaluation of the proposed approach and comparisons with state-of-the-art three approaches revealed an improvement of 18.96%, 21.77%, and 9.50% in average precision with Ding et al, Shahid et al, and Habib et al. respectively. This research has significant implications for citation indexes and digital libraries.
Keywords