How reliable are unsupervised author disambiguation algorithms in the assessment of research organization performance?

Giovanni Abramo; Ciriaco Andrea D’Angelo

doi:10.1162/qss_a_00236

Quantitative Science Studies (Jan 2023)

How reliable are unsupervised author disambiguation algorithms in the assessment of research organization performance?

Giovanni Abramo,
Ciriaco Andrea D’Angelo

Affiliations

Giovanni Abramo: ORCiD; Laboratory for Studies in Research Evaluation, Institute for System Analysis and Computer Science (IASI-CNR), National Research Council of Italy, Rome, Italy
Ciriaco Andrea D’Angelo: ORCiD; Laboratory for Studies in Research Evaluation, Institute for System Analysis and Computer Science (IASI-CNR), National Research Council of Italy, Rome, Italy

DOI: https://doi.org/10.1162/qss_a_00236
Journal volume & issue: Vol. 4, no. 1
pp. 144 – 166

Abstract

Read online

AbstractAssessing the performance of universities by output to input indicators requires knowledge of the individual researchers working within them. Although in Italy the Ministry of University and Research updates a database of university professors, in all those countries where such databases are not available, measuring research performance is a formidable task. One possibility is to trace the research personnel of institutions indirectly through their publications, using bibliographic repertories together with author names disambiguation algorithms. This work evaluates the goodness-of-fit of the Caron and van Eck, CvE unsupervised algorithm by comparing the research performance of Italian universities resulting from its application for the derivation of the universities’ research staff, with that resulting from the supervised algorithm of D’Angelo, Giuffrida, and Abramo (2011), which avails of input data. Results show that the CvE algorithm overestimates the size of the research staff of organizations by 56%. Nonetheless, the performance scores and ranks recorded in the two compared modes show a significant and high correlation. Still, nine out of 69 universities show rank deviations of two quartiles. Measuring the extent of distortions inherent in any evaluation exercises using unsupervised algorithms, can inform policymakers’ decisions on building national research staff databases, instead of settling for the unsupervised approaches.

Published in Quantitative Science Studies

ISSN: 2641-3337 (Online)
Publisher: The MIT Press
Country of publisher: United States
LCC subjects: Science: Science (General)
Website: https://direct.mit.edu/qss

About the journal