IEEE Access (Jan 2019)

Research on Topic Detection and Tracking for Online News Texts

  • Guixian Xu,
  • Yueting Meng,
  • Zhan Chen,
  • Xiaoyu Qiu,
  • Changzhi Wang,
  • Haishen Yao

DOI
https://doi.org/10.1109/ACCESS.2019.2914097
Journal volume & issue
Vol. 7
pp. 58407 – 58418

Abstract

Read online

With the rapid development of the Internet, the amount of data has grown exponentially. On the one hand, the accumulation of big data provides the basic support for artificial intelligence. On the other hand, in the face of such huge data information, how to extract the knowledge of interest from it has become a matter of general concern. Topic tracking can help people to explore the process of topic development from the huge and complex network texts information. By effectively organizing large-scale news documents, a method for the evolution of news topics over time is proposed in this paper to realize the tracking and evolution of topics in the news text set. First, the LDA (latent Dirichlet allocation) model is used to extract topics from news texts and the Gibbs Sampling method is used to speculate parameters. The topic mining using the K-means method is compared to highlight the advantages of using LDA for topic discovery. Second, the improved single-pass algorithm is used to track news topics. The JS (Jensen-Shannon) divergence is used to measure the topic similarity, and the time decay function is introduced to improve the similarity between topics with the similar time. Finally, the strength of the news topic and the content change of the topic in different time windows are analyzed. The experiments show that the proposed method can effectively detect and track the topic and clearly reflect the trend of topic evolution.

Keywords