Rapid, Reference-Free human genotype imputation with denoising autoencoders
Raquel Dias,
Doug Evans,
Shang-Fu Chen,
Kai-Yu Chen,
Salvatore Loguercio,
Leslie Chan,
Ali Torkamani
Affiliations
Raquel Dias
Scripps Research Translational Institute, Scripps Research Institute, La Jolla, United States; Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, United States; Department of Microbiology and Cell Science, University of Florida, Gainesville, United States
Doug Evans
Scripps Research Translational Institute, Scripps Research Institute, La Jolla, United States; Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, United States
Shang-Fu Chen
Scripps Research Translational Institute, Scripps Research Institute, La Jolla, United States; Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, United States
Kai-Yu Chen
Scripps Research Translational Institute, Scripps Research Institute, La Jolla, United States; Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, United States
Salvatore Loguercio
Scripps Research Translational Institute, Scripps Research Institute, La Jolla, United States; Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, United States
Leslie Chan
Scripps Research Translational Institute, Scripps Research Institute, La Jolla, United States; Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, United States
Scripps Research Translational Institute, Scripps Research Institute, La Jolla, United States; Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, United States
Genotype imputation is a foundational tool for population genetics. Standard statistical imputation approaches rely on the co-location of large whole-genome sequencing-based reference panels, powerful computing environments, and potentially sensitive genetic study data. This results in computational resource and privacy-risk barriers to access to cutting-edge imputation techniques. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium. Artificial neural network-based imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Here, we demonstrate an autoencoder-based approach for genotype imputation, using a large, commonly used reference panel, and spanning the entirety of human chromosome 22. Our autoencoder-based genotype imputation strategy achieved superior imputation accuracy across the allele-frequency spectrum and across genomes of diverse ancestry, while delivering at least fourfold faster inference run time relative to standard imputation tools.