Visualizing population structure with variational autoencoders

C J Battey; Gabrielle C Coffing; Andrew D Kern

doi:10.1093/g3journal/jkaa036

G3: Genes, Genomes, Genetics (Jan 2021)

Visualizing population structure with variational autoencoders

C J Battey,
Gabrielle C Coffing,
Andrew D Kern

Affiliations

C J Battey: ORCiD; Department of Biology, University of Oregon Institute of Ecology and Evolution, Eugene, Oregon, 97403
Gabrielle C Coffing: ORCiD; Department of Biology, University of Oregon Institute of Ecology and Evolution, Eugene, Oregon, 97403
Andrew D Kern: ORCiD; Department of Biology, University of Oregon Institute of Ecology and Evolution, Eugene, Oregon, 97403

DOI: https://doi.org/10.1093/g3journal/jkaa036
Journal volume & issue: Vol. 11, no. 1

Abstract

Read online

AbstractDimensionality reduction is a common tool for visualization and inference of population structure from genotypes, but popular methods either return too many dimensions for easy plotting (PCA) or fail to preserve global geometry (t-SNE and UMAP). Here we explore the utility of variational autoencoders (VAEs)—generative machine learning models in which a pair of neural networks seek to first compress and then recreate the input data—for visualizing population genetic variation. VAEs incorporate nonlinear relationships, allow users to define the dimensionality of the latent space, and in our tests preserve global geometry better than t-SNE and UMAP. Our implementation, which we call popvae, is available as a command-line python program at github.com/kr-colab/popvae. The approach yields latent embeddings that capture subtle aspects of population structure in humans and Anopheles

Published in G3: Genes, Genomes, Genetics

ISSN: 2160-1836 (Online)
Publisher: Oxford University Press
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://academic.oup.com/g3journal

About the journal