IEEE Access (Jan 2024)

Event Graph-Based News Clustering: The Role of Named Entity-Centered Subgraphs

  • Basak Buluz Komecoglu,
  • Burcu Yilmaz

DOI
https://doi.org/10.1109/ACCESS.2024.3435343
Journal volume & issue
Vol. 12
pp. 105613 – 105632

Abstract

Read online

In an era of exponential growth in online news sources, the need for intelligent digital solutions capable of efficiently analyzing and organizing large amounts of news content has become crucial. This paper presents a graph-based methodology designed to enhance Topic Detection and Tracking (TDT) tasks in natural language processing by efficiently clustering news events into coherent stories. The proposed approach leverages a novel event graph model that captures not only the characteristics of individual news events but also their collective narrative context. Using Named Entity Centred Frequent Subgraphs, the model excels in identifying recurring patterns of events and thus provides a framework for learning a robust, language-independent, and structured representation for structuring news stories, which represents a significant advance in the refinement of traditional clustering algorithms. Empirical experiments using a multilingual benchmark dataset, the News Clustering Dataset, highlight the superior clustering performance of our approach compared to state-of-the-art monolingual document clustering techniques, particularly in English and the competitive results in Spanish. To underline the adaptability of the methodology to low-resource languages, the Turkish ‘Story-Based News Dataset’ developed specifically for this study also promises to serve as an important resource for a wide range of natural language processing tasks.

Keywords