Information (Oct 2024)

Sentence Embeddings and Semantic Entity Extraction for Identification of Topics of Short Fact-Checked Claims

  • Krzysztof Węcel,
  • Marcin Sawiński,
  • Włodzimierz Lewoniewski,
  • Milena Stróżyna,
  • Ewelina Księżniak,
  • Witold Abramowicz

DOI
https://doi.org/10.3390/info15100659
Journal volume & issue
Vol. 15, no. 10
p. 659

Abstract

Read online

The objective of this research was to design a method to assign topics to claims debunked by fact-checking agencies. During the fact-checking process, access to more structured knowledge is necessary; therefore, we aim to describe topics with semantic vocabulary. Classification of topics should go beyond simple connotations like instance-class and rather reflect broader phenomena that are recognized by fact checkers. The assignment of semantic entities is also crucial for the automatic verification of facts using the underlying knowledge graphs. Our method is based on sentence embeddings, various clustering methods (HDBSCAN, UMAP, K-means), semantic entity matching, and terms importance assessment based on TF-IDF. We represent our topics in semantic space using Wikidata Q-ids, DBpedia, Wikipedia topics, YAGO, and other relevant ontologies. Such an approach based on semantic entities also supports hierarchical navigation within topics. For evaluation, we compare topic modeling results with claims already tagged by fact checkers. The work presented in this paper is useful for researchers and practitioners interested in semantic topic modeling of fake news narratives.

Keywords