Nongye tushu qingbao xuebao (Oct 2022)
A Survey of Author Name Disambiguation Techniques of Academic Papers
Abstract
[Purpose/Significance] This paper investigates the research on author name disambiguation published in recent years, and reviews the development context of relevant research from the perspective of the impact of data on author name disambiguation methods, so as to provide reference for further research. [Method/Process] The papers related to author name disambiguation were collected from English research databases such as Web of Science, Scopus, Google Academic, ACM Digital Library, IEEE Xplore, ScienceDirect, Scopus and Springer Link, and Chinese research databases such as CNKI, CQVIP and WANFANG. The search results cover the relevant papers published from 1998 to 2021. On the premise of giving consideration to authority, influence and novelty, 46 publicationswere selected for review. There are many types and structures of author name disambiguation data. For example, literature feature information is generally presented in unstructured text, and the extracted features can be stored and represented in two-dimensional tables; Citation information and interpersonal relationship are network relational data, which can be stored and represented by graphs, key value pairs or two-dimensional tables. The fundamental reason for different data structures lies in their semantic differences, but the data structure itself determines its applicable algorithm. According to the structure of characteristic data used in the author name disambiguation task and the different corresponding data processing algorithms, the relevant research is divided into three categories: 1) disambiguation method based on literature characteristics, 2) disambiguation method based on social network and 3) disambiguation method by integrating external knowledge. The impact of data on the author name disambiguation method is examined from the data level. [Results/Conclusions] The analysis found that with the progress of technology, deep learning methods have been widely used. Compared with the improvement of the model, the feature learning and representation based on deep learning can significantly improve the effect of the author name disambiguation algorithm. In addition, in order to overcome the problem of insufficient data utilization by a single method and improve the utilization efficiency of data, the three methods show the trend of mutual combination and complementary gain. From the literature research results, there are few related studies on incremental author name disambiguation and multi-language author name disambiguation, which could be one of the directions for further research.
Keywords