Quantitative Science Studies (Jan 2020)
A comparison of large-scale science models based on textual, direct citation and hybrid relatedness
Abstract
AbstractRecent large-scale bibliometric models have largely been based on direct citation, and several recent studies have explored augmenting direct citation with other citation-based or textual characteristics. In this study we compare clustering results from direct citation, extended direct citation, a textual relatedness measure, and several citation-text hybrid measures using a set of nine million documents. Three different accuracy measures are employed, one based on references in authoritative documents, one using textual relatedness, and the last using document pairs linked by grants. We find that a hybrid relatedness measure based equally on direct citation and PubMed-related article scores gives more accurate clusters (in the aggregate) than the other relatedness measures tested. We also show that the differences in cluster contents between the different models are even larger than the differences in accuracy, suggesting that the textual and citation logics are complementary. Finally, we show that for the hybrid measure based on direct citation and related article scores, the larger clusters are more oriented toward textual relatedness, while the smaller clusters are more oriented toward citation-based relatedness.