Comparison of methods for building polygenic scores for diverse populations
Sophia Gunn,
Xin Wang,
Daniel C. Posner,
Kelly Cho,
Jennifer E. Huffman,
Michael Gaziano,
Peter W. Wilson,
Yan V. Sun,
Gina Peloso,
Kathryn L. Lunetta
Affiliations
Sophia Gunn
Biostatistics, Boston University School of Public Health, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA; Corresponding author
Xin Wang
Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA; Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Daniel C. Posner
Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) , Boston, MA, USA
Kelly Cho
Department of Medicine, Harvard Medical School, Boston, MA, USA; MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, USA; Department of Medicine, Division of Aging, Brigham and Women’s Hospital, Boston, MA 02115, USA
Jennifer E. Huffman
Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) , Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Palo Alto Veterans Institute for Research (PAVIR), Palo Alto Health Care System, Palo Alto, CA, USA
Michael Gaziano
Department of Medicine, Harvard Medical School, Boston, MA, USA; MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, USA; Department of Medicine, Division of Aging, Brigham and Women’s Hospital, Boston, MA 02115, USA
Peter W. Wilson
VA Atlanta Healthcare System, Decatur, GA, USA; Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA; Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
Yan V. Sun
VA Atlanta Healthcare System, Decatur, GA, USA; Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
Gina Peloso
Biostatistics, Boston University School of Public Health, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA
Kathryn L. Lunetta
Biostatistics, Boston University School of Public Health, Boston, MA, USA
Summary: Polygenic scores (PGSs) are a promising tool for estimating individual-level genetic risk of disease based on the results of genome-wide association studies (GWASs). However, their promise has yet to be fully realized because most currently available PGSs were built with genetic data from predominantly European-ancestry populations, and PGS performance declines when scores are applied to target populations different from the populations from which they were derived. Thus, there is a great need to improve PGS performance in currently under-studied populations. In this work we leverage data from two large and diverse cohorts the Million Veterans Program (MVP) and All of Us (AoU), providing us the unique opportunity to compare methods for building PGSs for multi-ancestry populations across multiple traits. We build PGSs for five continuous traits and five binary traits using both multi-ancestry and single-ancestry approaches with popular Bayesian PGS methods and both MVP META GWAS results and population-specific GWAS results from the respective African, European, and Hispanic MVP populations. We evaluate these scores in three AoU populations genetically similar to the respective African, Admixed American, and European 1000 Genomes Project superpopulations. Using correlation-based tests, we make formal comparisons of the PGS performance across the multiple AoU populations. We conclude that approaches that combine GWAS data from multiple populations produce PGSs that perform better than approaches that utilize smaller single-population GWAS results matched to the target population, and specifically that multi-ancestry scores built with PRS-CSx outperform the other approaches in the three AoU populations.