Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability

Anne Lauscher; Pablo Ruiz Fabo; Federico Nanni; Simone Paolo Ponzetto

doi:10.4000/ijcol.392

IJCoL (Dec 2016)

Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability

Anne Lauscher,
Pablo Ruiz Fabo,
Federico Nanni,
Simone Paolo Ponzetto

Affiliations

Anne Lauscher
Pablo Ruiz Fabo
Federico Nanni
Simone Paolo Ponzetto

DOI: https://doi.org/10.4000/ijcol.392
Journal volume & issue: Vol. 2, no. 2
pp. 67 – 87

Abstract

Read online

Digital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability and support the quantitative evaluation of the obtained results. We illustrate the potential of the approach by applying it to three datasets, namely the transcription of speeches from the European Parliament fifth mandate, the Enron Corpus and the Hillary Clinton Email Dataset. While some of these resources have already been adopted by the natural language processing community, they still hold a large potential for humanities scholars, part of which could be exploited in studies that will adopt the fine-grained exploration method presented in this paper.

Published in IJCoL

ISSN: 2499-4553 (Online)
Publisher: Accademia University Press
Country of publisher: Italy
LCC subjects: Social Sciences; Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: https://journals.openedition.org/ijcol

About the journal