Jiàoyù zīliào yǔ túshūguǎn xué (Nov 2011)

Ambiguity Resolution for Author Names of Bibliographic Data

  • Kuang-Ha Chen,
  • Chi-Nan Hsieh

Journal volume & issue
Vol. 49, no. 2
pp. 215 – 240

Abstract

Read online

Users or researchers have been confronted with serious problems in ambiguities of author names, while a great deal of scholar information quickly accumulated in Internet. Therefore researches on ambiguity resolution for author name are indispensable. With comparison to previous work, this study attempts to address the problem using information contained in bibliographic data only. Five features, co-author (C), article title (T), journal title (J), year (Y), and number of pages (P), are used to disambiguate author names in this study. Note that feature Y and feature P are not ever used before. Both supervised learning methods (Naïve Bayes and Support Vector Machine) and unsupervised learning method (K-means) are employed to explore 28 different feature combinations. The findings show that the performance of feature journal title (J) and co-author (C) is very effective. Feature J plays an important role in three different approaches, and feature C is effective in SVM. In addition, feature Y and feature P obviously enhance accuracy and the average improvement rate of inclusion with feature Y is more significant than that of feature P (+2.5% in average). It is also shown that the performance of feature combination CTJ used traditionally is not superior to JYP, and the performance of feature combinations CJY, JY and J are also very effective in the three methods. Finally, it is found that the accuracy of disambiguation on larger datasets is 10% inferior to that of the smaller ones, which indicated the limitation of using bibliographic data only in this “numerous and jumbled” real world. Consequently, the effective approach to disambiguating author name has to not only fully use bibliographic data but also introduce appropriate outer resources.