An Empirical Evaluation of Document Embeddings and Similarity Metrics for Scientific Articles

Joaquin Gómez; Pere-Pau Vázquez

doi:10.3390/app12115664

Applied Sciences (Jun 2022)

An Empirical Evaluation of Document Embeddings and Similarity Metrics for Scientific Articles

Joaquin Gómez,
Pere-Pau Vázquez

Affiliations

Joaquin Gómez: Department of Computer Science, Universitat Politècnica de Catalunya, 08034 Barcelona, Spain
Pere-Pau Vázquez: ViRVIG Group, Department of Computer Science, Universitat Politècnica de Catalunya, 08034 Barcelona, Spain

DOI: https://doi.org/10.3390/app12115664
Journal volume & issue: Vol. 12, no. 11
p. 5664

Abstract

Read online

The comparison of documents—such as articles or patents search, bibliography recommendations systems, visualization of document collections, etc.—has a wide range of applications in several fields. One of the key tasks that such problems have in common is the evaluation of a similarity metric. Many such metrics have been proposed in the literature. Lately, deep learning techniques have gained a lot of popularity. However, it is difficult to analyze how those metrics perform against each other. In this paper, we present a systematic empirical evaluation of several of the most popular similarity metrics when applied to research articles. We analyze the results of those metrics in two ways, with a synthetic test that uses scientific papers and Ph.D. theses, and in a real-world scenario where we evaluate their ability to cluster papers from different areas of research.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords