Exploiting Contextual Word Embedding for Identification of Important Citations: Incorporating Section-Wise Citation Counts and Metadata Features

Arshad Iqbal; Abdul Shahid; Muhammad Roman; Muhammad Tanvir Afzal; Muhammad Yahya

doi:10.1109/ACCESS.2023.3320038

IEEE Access (Jan 2023)

Exploiting Contextual Word Embedding for Identification of Important Citations: Incorporating Section-Wise Citation Counts and Metadata Features

Arshad Iqbal,
Abdul Shahid,
Muhammad Roman,
Muhammad Tanvir Afzal,
Muhammad Yahya

Affiliations

Arshad Iqbal: ORCiD; Institute of Computing, Kohat University of Science and Technology, Kohat, Pakistan
Abdul Shahid: Institute of Computing, Kohat University of Science and Technology, Kohat, Pakistan
Muhammad Roman: ORCiD; Institute of Computing, Kohat University of Science and Technology, Kohat, Pakistan
Muhammad Tanvir Afzal: Department of Computing, Faculty of Computing, Shifa Tameer-e-Millat University, Islamabad, Pakistan
Muhammad Yahya: ORCiD; Data Science Institute, University of Galway, Galway, Ireland

DOI: https://doi.org/10.1109/ACCESS.2023.3320038
Journal volume & issue: Vol. 11
pp. 114044 – 114060

Abstract

Read online

Finding relevant research papers can be a challenging task due to the enormous number of scientific publications released each year. Recently, the scientific community has been diving deep into citation analysis, specifically examining the content of papers to identify more crucial documents. Citations serve as potential parameters for establishing connections between research articles. They have been widely utilized for various academic purposes, including calculating journal impact factors, determining researchers’ h-index, allocating research grants, and pinpointing the latest research trends. However, researchers have argued that not all citations carry equal weight in terms of influence. Consequently, alternative techniques have been proposed to identify significant citations based on content, metadata, and bibliographic information. Nonetheless, the current state-of-the-art approaches still require further refinement. Additionally, the application of deep learning models and word embedding techniques in this context has not been extensively studied.In this research work, we present an approach consisting of two primary modules: 1) Section-wise citation count, and 2) metadata-based analysis of citation intent. Our study involves conducting several experiments using deep learning models in conjunction with FastText, word2vec, and BERT-based word embeddings to perform citation analysis. These experiments were carried out using two benchmark datasets, and the results were compared with a contemporary study that employed a rich set of content-based features for classification. Our findings reveal that the deep learning CNN model, coupled with FastText word embeddings, achieves the best results in terms of accuracy, precision, and recall. It outperforms the existing state-of-the-art model, achieving a precision score of 0.97

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords