Data Science and Engineering (Mar 2024)

Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence Network

  • Eric Austin,
  • Shraddha Makwana,
  • Amine Trabelsi,
  • Christine Largeron,
  • Osmar R. Zaïane

DOI
https://doi.org/10.1007/s41019-023-00239-2
Journal volume & issue
Vol. 9, no. 1
pp. 41 – 61

Abstract

Read online

Abstract Topic modeling aims to discover latent themes in collections of text documents. It has various applications across fields such as sociology, opinion analysis, and media studies. In such areas, it is essential to have easily interpretable, diverse, and coherent topics. An efficient topic modeling technique should accurately identify flat and hierarchical topics, especially useful in disciplines where topics can be logically arranged into a tree format. In this paper, we propose Community Topic, a novel algorithm that exploits word co-occurrence networks to mine communities and produces topics. We also evaluate the proposed approach using several metrics and compare it with usual baselines, confirming its good performances. Community Topic enables quick identification of flat topics and topic hierarchy, facilitating the on-demand exploration of sub- and super-topics. It also obtains good results on datasets in different languages.

Keywords