Measurement of Text Similarity: A Survey

Jiapeng Wang; Yihong Dong

doi:10.3390/info11090421

Information (Aug 2020)

Measurement of Text Similarity: A Survey

Jiapeng Wang,
Yihong Dong

Affiliations

Jiapeng Wang: Computer Engineering Department, Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China
Yihong Dong: Computer Engineering Department, Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China

DOI: https://doi.org/10.3390/info11090421
Journal volume & issue: Vol. 11, no. 9
p. 421

Abstract

Read online

Text similarity measurement is the basis of natural language processing tasks, which play an important role in information retrieval, automatic question answering, machine translation, dialogue systems, and document matching. This paper systematically combs the research status of similarity measurement, analyzes the advantages and disadvantages of current methods, develops a more comprehensive classification description system of text similarity measurement algorithms, and summarizes the future development direction. With the aim of providing reference for related research and application, the text similarity measurement method is described by two aspects: text distance and text representation. The text distance can be divided into length distance, distribution distance, and semantic distance; text representation is divided into string-based, corpus-based, single-semantic text, multi-semantic text, and graph-structure-based representation. Finally, the development of text similarity is also summarized in the discussion section.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords