Mathematics (Oct 2023)

A Unified Formal Framework for Factorial and Probabilistic Topic Modelling

  • Karina Gibert,
  • Yaroslav Hernandez-Potiomkin

DOI
https://doi.org/10.3390/math11204375
Journal volume & issue
Vol. 11, no. 20
p. 4375

Abstract

Read online

Topic modelling has become a highly popular technique for extracting knowledge from texts. It encompasses various method families, including Factorial methods, Probabilistic methods, and Natural Language Processing methods. This paper introduces a unified conceptual framework for Factorial and Probabilistic methods by identifying shared elements and representing them using a homogeneous notation. The paper presents 12 different methods within this framework, enabling easy comparative analysis to assess the flexibility and how realistic the assumptions of each approach are. This establishes the initial stage of a broader analysis aimed at relating all method families to this common framework, comprehensively understanding their strengths and weaknesses, and establishing general application guidelines. Also, an experimental setup reinforces the convenience of having harmonized notational schema. The paper concludes with a discussion on the presented methods and outlines future research directions.

Keywords