Improving genetic risk prediction across diverse population by disentangling ancestry representations

Prashnna K. Gyawali; Yann Le Guen; Xiaoxia Liu; Michael E. Belloy; Hua Tang; James Zou; Zihuai He

doi:10.1038/s42003-023-05352-6

Communications Biology (Sep 2023)

Improving genetic risk prediction across diverse population by disentangling ancestry representations

Prashnna K. Gyawali,
Yann Le Guen,
Xiaoxia Liu,
Michael E. Belloy,
Hua Tang,
James Zou,
Zihuai He

Affiliations

Prashnna K. Gyawali: Department of Neurology and Neurological Sciences, Stanford University
Yann Le Guen: Department of Neurology and Neurological Sciences, Stanford University
Xiaoxia Liu: Department of Neurology and Neurological Sciences, Stanford University
Michael E. Belloy: Department of Neurology and Neurological Sciences, Stanford University
Hua Tang: Department of Genetics, Stanford University
James Zou: Department of Biomedical Data Science, Stanford University
Zihuai He: Department of Neurology and Neurological Sciences, Stanford University

DOI: https://doi.org/10.1038/s42003-023-05352-6
Journal volume & issue: Vol. 6, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Risk prediction models using genetic data have seen increasing traction in genomics. However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry. This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans. To address this issue, largely due to the prediction models being biased by the underlying population structure, we propose a deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations. We applied the proposed method to the analysis of Alzheimer’s disease genetics. Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, including admixed individuals, without needing self-reported ancestry information.

Published in Communications Biology

ISSN: 2399-3642 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General)
Website: https://www.nature.com/commsbio/

About the journal