Известия высших учебных заведений. Поволжский регион:Технические науки (Jun 2024)

Variants of Zipf’s hyperbolic law for fractal description of probability-rank distribution of letters in English technical texts

  • A.I. Ivanov,
  • A.P. Ivanov,
  • A.P. Yunin,
  • R.V. Eremenko

DOI
https://doi.org/10.21685/2072-3059-2024-1-3
Journal volume & issue
no. 1

Abstract

Read online

Background. Extension of the scope of the classical hyperbolic Zipf’s law. Previously, this law was used to statistically describe the probability of occurrence of words in texts in one of the European languages. Materials and methods. The letters are sorted according to the probability of their use in English texts with a length of 3799 characters. Results. It is shown that the statistics of frequently used small letters are easily separable from the statistics of large letters. Even on short texts, ordering letter encodings by their probability of occurrence yields another hyperbolic Zipf distribution for letters and punctuation marks. Conclusions. The distribution of the lengths of words marked with spaces on both sides according to the Mandelbrodt method is given. Additionally, the selection of words between the ASCII codes “101” corresponding to the English letters “e” is given. It is shown that for all other frequently used letters of the English language “t”, “a”, “i”, “r” the corresponding functionals with linear computational complexity can be constructed. As a result, we get a number of new statistical functionals for a deeper analysis of texts in European languages. New statistical functionals can be used to evaluate the strength of long passphrases.

Keywords