The Programming Historian (May 2020)

Understanding and Using Common Similarity Measures for Text Analysis

  • John R. Ladd

DOI
https://doi.org/10.46430/phen0089
Journal volume & issue
Vol. 9

Abstract

Read online

This tutorial will focus on measuring distance among texts by describing the advantages and disadvantages of three of the most common distance measures: city block or “Manhattan” distance, Euclidean distance, and cosine distance. In this lesson, you will learn when to use one measure over another and how to calculate these distances using the SciPy library in Python.

Keywords