Cell Genomics (Jan 2023)

Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts

  • Ying Wang,
  • Shinichi Namba,
  • Esteban Lopera,
  • Sini Kerminen,
  • Kristin Tsuo,
  • Kristi Läll,
  • Masahiro Kanai,
  • Wei Zhou,
  • Kuan-Han Wu,
  • Marie-Julie Favé,
  • Laxmi Bhatta,
  • Philip Awadalla,
  • Ben Brumpton,
  • Patrick Deelen,
  • Kristian Hveem,
  • Valeria Lo Faro,
  • Reedik Mägi,
  • Yoshinori Murakami,
  • Serena Sanna,
  • Jordan W. Smoller,
  • Jasmina Uzunovic,
  • Brooke N. Wolford,
  • Cristen Willer,
  • Eric R. Gamazon,
  • Nancy J. Cox,
  • Ida Surakka,
  • Yukinori Okada,
  • Alicia R. Martin,
  • Jibril Hirbo,
  • Wei Zhou,
  • Masahiro Kanai,
  • Kuan-Han H. Wu,
  • Humaira Rasheed,
  • Kristin Tsuo,
  • Jibril B. Hirbo,
  • Ying Wang,
  • Arjun Bhattacharya,
  • Huiling Zhao,
  • Shinichi Namba,
  • Ida Surakka,
  • Brooke N. Wolford,
  • Valeria Lo Faro,
  • Esteban A. Lopera-Maya,
  • Kristi Läll,
  • Marie-Julie Favé,
  • Sinéad B. Chapman,
  • Juha Karjalainen,
  • Mitja Kurki,
  • Maasha Mutaamba,
  • Juulia J. Partanen,
  • Ben M. Brumpton,
  • Sameer Chavan,
  • Tzu-Ting Chen,
  • Michelle Daya,
  • Yi Ding,
  • Yen-Chen A. Feng,
  • Christopher R. Gignoux,
  • Sarah E. Graham,
  • Whitney E. Hornsby,
  • Nathan Ingold,
  • Ruth Johnson,
  • Triin Laisk,
  • Kuang Lin,
  • Jun Lv,
  • Iona Y. Millwood,
  • Priit Palta,
  • Anita Pandit,
  • Michael H. Preuss,
  • Unnur Thorsteinsdottir,
  • Jasmina Uzunovic,
  • Matthew Zawistowski,
  • Xue Zhong,
  • Archie Campbell,
  • Kristy Crooks,
  • Geertruida H. de Bock,
  • Nicholas J. Douville,
  • Sarah Finer,
  • Lars G. Fritsche,
  • Christopher J. Griffiths,
  • Yu Guo,
  • Karen A. Hunt,
  • Takahiro Konuma,
  • Riccardo E. Marioni,
  • Jansonius Nomdo,
  • Snehal Patil,
  • Nicholas Rafaels,
  • Anne Richmond,
  • Jonathan A. Shortt,
  • Peter Straub,
  • Ran Tao,
  • Brett Vanderwerff,
  • Kathleen C. Barnes,
  • Marike Boezen,
  • Zhengming Chen,
  • Chia-Yen Chen,
  • Judy Cho,
  • George Davey Smith,
  • Hilary K. Finucane,
  • Lude Franke,
  • Eric R. Gamazon,
  • Andrea Ganna,
  • Tom R. Gaunt,
  • Tian Ge,
  • Hailiang Huang,
  • Jennifer Huffman,
  • Jukka T. Koskela,
  • Clara Lajonchere,
  • Matthew H. Law,
  • Liming Li,
  • Cecilia M. Lindgren,
  • Ruth J.F. Loos,
  • Stuart MacGregor,
  • Koichi Matsuda,
  • Catherine M. Olsen,
  • David J. Porteous,
  • Jordan A. Shavit,
  • Harold Snieder,
  • Richard C. Trembath,
  • Judith M. Vonk,
  • David Whiteman,
  • Stephen J. Wicks,
  • Cisca Wijmenga,
  • John Wright,
  • Jie Zheng,
  • Xiang Zhou,
  • Philip Awadalla,
  • Michael Boehnke,
  • Nancy J. Cox,
  • Daniel H. Geschwind,
  • Caroline Hayward,
  • Kristian Hveem,
  • Eimear E. Kenny,
  • Yen-Feng Lin,
  • Reedik Mägi,
  • Hilary C. Martin,
  • Sarah E. Medland,
  • Yukinori Okada,
  • Aarno V. Palotie,
  • Bogdan Pasaniuc,
  • Serena Sanna,
  • Jordan W. Smoller,
  • Kari Stefansson,
  • David A. van Heel,
  • Robin G. Walters,
  • Sebastian Zöllner,
  • Alicia R. Martin,
  • Cristen J. Willer,
  • Mark J. Daly,
  • Benjamin M. Neale

Journal volume & issue
Vol. 3, no. 1
p. 100241

Abstract

Read online

Summary: Polygenic risk scores (PRSs) have been widely explored in precision medicine. However, few studies have thoroughly investigated their best practices in global populations across different diseases. We here utilized data from Global Biobank Meta-analysis Initiative (GBMI) to explore methodological considerations and PRS performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRSs using pruning and thresholding (P + T) and PRS-continuous shrinkage (CS). For both methods, using a European-based linkage disequilibrium (LD) reference panel resulted in comparable or higher prediction accuracy compared with several other non-European-based panels. PRS-CS overall outperformed the classic P + T method, especially for endpoints with higher SNP-based heritability. Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma, which has known variation in disease prevalence across populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using GBMI resources and highlight the importance of best practices for PRS in the biobank-scale genomics era.

Keywords