Entropy (Jun 2022)

Modeling Long-Range Dynamic Correlations of Words in Written Texts with Hawkes Processes

  • Hiroshi Ogura,
  • Yasutaka Hanada,
  • Hiromi Amano,
  • Masato Kondo

DOI
https://doi.org/10.3390/e24070858
Journal volume & issue
Vol. 24, no. 7
p. 858

Abstract

Read online

It has been clarified that words in written texts are classified into two groups called Type-I and Type-II words. The Type-I words are words that exhibit long-range dynamic correlations in written texts while the Type-II words do not show any type of dynamic correlations. Although the stochastic process of yielding Type-II words has been clarified to be a superposition of Poisson point processes with various intensities, there is no definitive model for Type-I words. In this study, we introduce a Hawkes process, which is known as a kind of self-exciting point process, as a candidate for the stochastic process that governs yielding Type-I words; i.e., the purpose of this study is to establish that the Hawkes process is useful to model occurrence patterns of Type-I words in real written texts. The relation between the Hawkes process and an existing model for Type-I words, in which hierarchical structures of written texts are considered to play a central role in yielding dynamic correlations, will also be discussed.

Keywords