Journal of Open Humanities Data (Feb 2023)

A Global Lexical Database (GLED) for Computational Historical Linguistics

  • Tiago Tresoldi

DOI
https://doi.org/10.5334/johd.96
Journal volume & issue
Vol. 9
pp. 2 – 2

Abstract

Read online

This work presents a lexical database with cognate annotation and phonological alignment for over 6,500 documented language varieties. The database includes per-family and global phylogenetic resources and offers a pre-computed global tree for language variety distance from normalized trees obtained with Bayesian Markov Chain Monte Carlo (MCMC) inference. Lexical data is provided in a single tabular file for convenience of usage, and resources are built adhering to best practices and state-of-the-art algorithms for historical linguistics. The database is a convenient source for research prototypes, method development, and analysis bootstrap. All resources are freely available for download for all interested researchers.

Keywords