PLoS ONE (Jan 2019)
Lexical Landscapes as large in silico data for examining advanced properties of fitness landscapes.
Abstract
In silico approaches have served a central role in the development of evolutionary theory for generations. This especially applies to the concept of the fitness landscape, one of the most important abstractions in evolutionary genetics, and one which has benefited from the presence of large empirical data sets only in the last decade or so. In this study, we propose a method that allows us to generate enormous data sets that walk the line between in silico and empirical: word usage frequencies as catalogued by the Google ngram corpora. These data can be codified or analogized in terms of a multidimensional empirical fitness landscape towards the examination of advanced concepts-adaptive landscape by environment interactions, clonal competition, higher-order epistasis and countless others. We argue that the greater Lexical Landscapes approach can serve as a platform that offers an astronomical number of fitness landscapes for exploration (at least) or theoretical formalism (potentially) in evolutionary biology.