The Programming Historian (Nov 2023)

Corpus Analysis with spaCy

  • Megan S. Kane

DOI
https://doi.org/10.46430/phen0113
Journal volume & issue
Vol. 12

Abstract

Read online

This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts.