IEEE Access (Jan 2023)

Exploiting Contextual Word Embedding for Identification of Important Citations: Incorporating Section-Wise Citation Counts and Metadata Features

  • Arshad Iqbal,
  • Abdul Shahid,
  • Muhammad Roman,
  • Muhammad Tanvir Afzal,
  • Muhammad Yahya

DOI
https://doi.org/10.1109/ACCESS.2023.3320038
Journal volume & issue
Vol. 11
pp. 114044 – 114060

Abstract

Read online

Finding relevant research papers can be a challenging task due to the enormous number of scientific publications released each year. Recently, the scientific community has been diving deep into citation analysis, specifically examining the content of papers to identify more crucial documents. Citations serve as potential parameters for establishing connections between research articles. They have been widely utilized for various academic purposes, including calculating journal impact factors, determining researchers’ h-index, allocating research grants, and pinpointing the latest research trends. However, researchers have argued that not all citations carry equal weight in terms of influence. Consequently, alternative techniques have been proposed to identify significant citations based on content, metadata, and bibliographic information. Nonetheless, the current state-of-the-art approaches still require further refinement. Additionally, the application of deep learning models and word embedding techniques in this context has not been extensively studied.In this research work, we present an approach consisting of two primary modules: 1) Section-wise citation count, and 2) metadata-based analysis of citation intent. Our study involves conducting several experiments using deep learning models in conjunction with FastText, word2vec, and BERT-based word embeddings to perform citation analysis. These experiments were carried out using two benchmark datasets, and the results were compared with a contemporary study that employed a rich set of content-based features for classification. Our findings reveal that the deep learning CNN model, coupled with FastText word embeddings, achieves the best results in terms of accuracy, precision, and recall. It outperforms the existing state-of-the-art model, achieving a precision score of 0.97

Keywords