The Programming Historian (May 2020)
Understanding and Using Common Similarity Measures for Text Analysis
Abstract
This tutorial will focus on measuring distance among texts by describing the advantages and disadvantages of three of the most common distance measures: city block or “Manhattan” distance, Euclidean distance, and cosine distance. In this lesson, you will learn when to use one measure over another and how to calculate these distances using the SciPy library in Python.
Keywords