eLife (Jan 2025)

Risk factors affecting polygenic score performance across diverse cohorts

  • Daniel Hui,
  • Scott Dudek,
  • Krzysztof Kiryluk,
  • Theresa L Walunas,
  • Iftikhar J Kullo,
  • Wei-Qi Wei,
  • Hemant Tiwari,
  • Josh F Peterson,
  • Wendy K Chung,
  • Brittney H Davis,
  • Atlas Khan,
  • Leah C Kottyan,
  • Nita A Limdi,
  • Qiping Feng,
  • Megan J Puckelwartz,
  • Chunhua Weng,
  • Johanna L Smith,
  • Elizabeth W Karlson,
  • Regeneron Genetics Center,
  • Penn Medicine BioBank,
  • Gail P Jarvik,
  • Marylyn D Ritchie

DOI
https://doi.org/10.7554/eLife.88149
Journal volume & issue
Vol. 12

Abstract

Read online

Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed the effects of covariate stratification and interaction on body mass index (BMI) PGS (PGSBMI) across four cohorts of European (N = 491,111) and African (N = 21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R2 differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R2 being nearly double between best- and worst-performing quintiles for certain covariates. Twenty-eight covariates had significant PGSBMI–covariate interaction effects, modifying PGSBMI effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R2 differences among strata and interaction effects – across all covariates, their main effects on BMI were correlated with their maximum R2 differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGSBMI individuals have highest R2 and increase in PGS effect. Using quantile regression, we show the effect of PGSBMI increases as BMI itself increases, and that these differences in effects are directly related to differences in R2 when stratifying by different covariates. Given significant and replicable evidence for context-specific PGSBMI performance and effects, we investigated ways to increase model performance taking into account nonlinear effects. Machine learning models (neural networks) increased relative model R2 (mean 23%) across datasets. Finally, creating PGSBMI directly from GxAge genome-wide association studies effects increased relative R2 by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGSBMI performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.

Keywords