Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

Natsuki Kadowaki; Kazuaki Kishida

doi:10.1633/jistap.2019.8.2.1

Journal of Information Science Theory and Practice (Jun 2020)

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

Natsuki Kadowaki,
Kazuaki Kishida

Affiliations

Natsuki Kadowaki: Keio University
Kazuaki Kishida: Keio University

DOI: https://doi.org/10.1633/jistap.2019.8.2.1
Journal volume & issue: Vol. 8, no. 2
pp. 6 – 17

Abstract

Read online

Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

Published in Journal of Information Science Theory and Practice

ISSN: 2287-9099 (Print); 2287-4577 (Online)
Publisher: Korea Institute of Science and Technology Information
Country of publisher: Korea, Republic of
LCC subjects: Bibliography. Library science. Information resources
Website: http://www.jistap.org/

About the journal

Abstract

Keywords