PLoS ONE (Jan 2014)

Exploring empirical rank-frequency distributions longitudinally through a simple stochastic process.

  • Benjamin J Finley,
  • Kalevi Kilkki

DOI
https://doi.org/10.1371/journal.pone.0094920
Journal volume & issue
Vol. 9, no. 4
p. e94920

Abstract

Read online

The frequent appearance of empirical rank-frequency laws, such as Zipf's law, in a wide range of domains reinforces the importance of understanding and modeling these laws and rank-frequency distributions in general. In this spirit, we utilize a simple stochastic cascade process to simulate several empirical rank-frequency distributions longitudinally. We focus especially on limiting the process's complexity to increase accessibility for non-experts in mathematics. The process provides a good fit for many empirical distributions because the stochastic multiplicative nature of the process leads to an often observed concave rank-frequency distribution (on a log-log scale) and the finiteness of the cascade replicates real-world finite size effects. Furthermore, we show that repeated trials of the process can roughly simulate the longitudinal variation of empirical ranks. However, we find that the empirical variation is often less that the average simulated process variation, likely due to longitudinal dependencies in the empirical datasets. Finally, we discuss the process limitations and practical applications.