The Programming Historian (Aug 2023)

Clustering and Visualising Documents using Word Embeddings

  • Jonathan Reades,
  • Jennie Williams

DOI
https://doi.org/10.46430/phen0111
Journal volume & issue
Vol. 12

Abstract

Read online

This lesson uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.