Imputation Performance in Latin American Populations: Improving Rare Variants Representation With the Inclusion of Native American Genomes

Andrés Jiménez-Kaufmann; Amanda Y. Chong; Adrián Cortés; Consuelo D. Quinto-Cortés; Selene L. Fernandez-Valverde; Leticia Ferreyra-Reyes; Luis Pablo Cruz-Hervert; Santiago G. Medina-Muñoz; Mashaal Sohail; Mashaal Sohail; María J. Palma-Martinez; Gudalupe Delgado-Sánchez; Norma Mongua-Rodríguez; Alexander J. Mentzer; Adrian V. S. Hill; Adrian V. S. Hill; Hortensia Moreno-Macías; Hortensia Moreno-Macías; Alicia Huerta-Chagoya; Carlos A. Aguilar-Salinas; Carlos A. Aguilar-Salinas; Michael Torres; Hie Lim Kim; Hie Lim Kim; Hie Lim Kim; Namrata Kalsi; Namrata Kalsi; Stephan C. Schuster; Stephan C. Schuster; Stephan C. Schuster; Teresa Tusié-Luna; Teresa Tusié-Luna; Diego Ortega Del-Vecchyo; Lourdes García-García; Andrés Moreno-Estrada

doi:10.3389/fgene.2021.719791

Frontiers in Genetics (Jan 2022)

Imputation Performance in Latin American Populations: Improving Rare Variants Representation With the Inclusion of Native American Genomes

Andrés Jiménez-Kaufmann,
Amanda Y. Chong,
Adrián Cortés,
Consuelo D. Quinto-Cortés,
Selene L. Fernandez-Valverde,
Leticia Ferreyra-Reyes,
Luis Pablo Cruz-Hervert,
Santiago G. Medina-Muñoz,
Mashaal Sohail,
Mashaal Sohail,
María J. Palma-Martinez,
Gudalupe Delgado-Sánchez,
Norma Mongua-Rodríguez,
Alexander J. Mentzer,
Adrian V. S. Hill,
Adrian V. S. Hill,
Hortensia Moreno-Macías,
Hortensia Moreno-Macías,
Alicia Huerta-Chagoya,
Carlos A. Aguilar-Salinas,
Carlos A. Aguilar-Salinas,
Michael Torres,
Hie Lim Kim,
Hie Lim Kim,
Hie Lim Kim,
Namrata Kalsi,
Namrata Kalsi,
Stephan C. Schuster,
Stephan C. Schuster,
Stephan C. Schuster,
Teresa Tusié-Luna,
Teresa Tusié-Luna,
Diego Ortega Del-Vecchyo,
Lourdes García-García,
Andrés Moreno-Estrada

Affiliations

Andrés Jiménez-Kaufmann: Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
Amanda Y. Chong: Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
Adrián Cortés: Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
Consuelo D. Quinto-Cortés: Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
Selene L. Fernandez-Valverde: Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
Leticia Ferreyra-Reyes: Instituto Nacional de Salud Pública, Cuernavaca, Mexico
Luis Pablo Cruz-Hervert: Instituto Nacional de Salud Pública, Cuernavaca, Mexico
Santiago G. Medina-Muñoz: Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
Mashaal Sohail: Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
Mashaal Sohail: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
María J. Palma-Martinez: Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
Gudalupe Delgado-Sánchez: Instituto Nacional de Salud Pública, Cuernavaca, Mexico
Norma Mongua-Rodríguez: Instituto Nacional de Salud Pública, Cuernavaca, Mexico
Alexander J. Mentzer: Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
Adrian V. S. Hill: Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
Adrian V. S. Hill: Nuffield Department of Medicine, The Jenner Institute, University of Oxford, Oxford, United Kingdom
Hortensia Moreno-Macías: Unidad de Biología Molecular y Medicina Genómica, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), Mexico City, Mexico
Hortensia Moreno-Macías: Departamento de Economía, Universidad Autónoma Metropolitana, Mexico City, Mexico
Alicia Huerta-Chagoya: Unidad de Biología Molecular y Medicina Genómica, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), Mexico City, Mexico
Carlos A. Aguilar-Salinas: Departamento de Endocrinología y Metabolismo, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Unidad de Investigación de Enfermedades Metabólicas, Mexico City, Mexico
Carlos A. Aguilar-Salinas: Tecnológico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, Mexico
Michael Torres: Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
Hie Lim Kim: 0Singapore Centre on Environmental Life Sciences Engineering, Nanyang Technological University, Singapore
Hie Lim Kim: 1GenomeAsia 100K (GA100K) Consortium, Singapore
Hie Lim Kim: 2 School of Biological Science, Nanyang Technological University, Singapore
Namrata Kalsi: 0Singapore Centre on Environmental Life Sciences Engineering, Nanyang Technological University, Singapore
Namrata Kalsi: 1GenomeAsia 100K (GA100K) Consortium, Singapore
Stephan C. Schuster: 0Singapore Centre on Environmental Life Sciences Engineering, Nanyang Technological University, Singapore
Stephan C. Schuster: 1GenomeAsia 100K (GA100K) Consortium, Singapore
Stephan C. Schuster: 2 School of Biological Science, Nanyang Technological University, Singapore
Teresa Tusié-Luna: Unidad de Biología Molecular y Medicina Genómica, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), Mexico City, Mexico
Teresa Tusié-Luna: 3Instituto de Investigaciones Biomédicas de la UNAM, Mexico City, Mexico
Diego Ortega Del-Vecchyo: 4Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), UNAM, Juriquilla, Mexico
Lourdes García-García: Instituto Nacional de Salud Pública, Cuernavaca, Mexico
Andrés Moreno-Estrada: Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico

DOI: https://doi.org/10.3389/fgene.2021.719791
Journal volume & issue: Vol. 12

Abstract

Read online

Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords