Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis

Lev Guzmán-Vargas; Bibiana Obregón-Quintana; Daniel Aguilar-Velázquez; Ricardo Hernández-Pérez; Larry S. Liebovitch

doi:10.3390/e17117798

Entropy (Nov 2015)

Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis

Lev Guzmán-Vargas,
Bibiana Obregón-Quintana,
Daniel Aguilar-Velázquez,
Ricardo Hernández-Pérez,
Larry S. Liebovitch

Affiliations

Lev Guzmán-Vargas: Unidad Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Av. IPN No. 2580, L. Ticomán, México D.F., 07340, Mexico
Bibiana Obregón-Quintana: Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad Universitaria, México D.F., 04510, Mexico
Daniel Aguilar-Velázquez: Unidad Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, Av. IPN No. 2580, L. Ticomán, México D.F., 07340, Mexico
Ricardo Hernández-Pérez: Departamento de Física, Escuela Superior de Física y Matemáticas, Instituto Politécnico Nacional, Edif. No. 9 U.P. Zacatenco, México D.F., 07738, Mexico
Larry S. Liebovitch: Departments of Physics and Psychology, Queens College, City University of New York, 65-30 Kissena Boulevard, SB B322, Flushing, NY 11367, USA

DOI: https://doi.org/10.3390/e17117798
Journal volume & issue: Vol. 17, no. 11
pp. 7798 – 7810

Abstract

Read online

We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P ( k ) ∼ k - γ , with two regimes, which are characterized by the exponents γ s ≈ 1 . 7 (at short degree scales) and γ l ≈ 1 . 3 (at large degree scales). This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA) and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords