Grasping the Anti-Modern Discourse on Europe in the Swiss Digitised Press, or can Text Mining Generate a Research Corpus from an Article Collection?

Estelle Bunout

doi:10.5334/johd.37

Journal of Open Humanities Data (Sep 2021)

Grasping the Anti-Modern Discourse on Europe in the Swiss Digitised Press, or can Text Mining Generate a Research Corpus from an Article Collection?

Estelle Bunout

Affiliations

Estelle Bunout: Department for Media history, Leibniz Centre for Contemporary History Potsdam, Potsdam

DOI: https://doi.org/10.5334/johd.37
Journal volume & issue: Vol. 7

Abstract

Read online

In this paper, we discuss how different types of automatic annotation of digitised newspaper articles can be integrated into the iterative questioning of the source material and the creation of research corpora out of a collection of unstructured texts (kept in a structured collection). We annotate a sizeable collection of Swiss press articles (183,270), extracted via the impresso interface1 using topic modelling (MALLET)2 as well as a naïve Bayes classifier (script by Milan van Lange). The methodological discussion we propose is to explore how text mining can help identify historical discourses that are difficult to query with keywords because of their inherent ambiguity and how to grasp them in a large corpus. We argue that the automated annotations can provide a body of corroborating evidence of the searched discourse, to be used as an intermediary and heuristic analysis step.

Published in Journal of Open Humanities Data

ISSN: 2059-481X (Online)
Publisher: Ubiquity Press
Country of publisher: United Kingdom
LCC subjects: General Works: History of scholarship and learning. The humanities; Language and Literature
Website: https://openhumanitiesdata.metajnl.com/

About the journal

Abstract

Keywords