EPJ Data Science (Oct 2023)

Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction

  • Fakhri Momeni,
  • Philipp Mayr,
  • Stefan Dietze

DOI
https://doi.org/10.1140/epjds/s13688-023-00421-6
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.

Keywords