Trait selection strategy in multi-trait GWAS: Boosting SNP discoverability
Yuka Suzuki,
Hervé Ménager,
Bryan Brancotte,
Raphaël Vernet,
Cyril Nerin,
Christophe Boetto,
Antoine Auvergne,
Christophe Linhard,
Rachel Torchet,
Pierre Lechat,
Lucie Troubat,
Michael H. Cho,
Emmanuelle Bouzigon,
Hugues Aschard,
Hanna Julienne
Affiliations
Yuka Suzuki
Institut Pasteur, Université Paris Cité, Department of Computational Biology, 75015 Paris, France; Corresponding author
Hervé Ménager
Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, 75015 Paris, France
Bryan Brancotte
Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, 75015 Paris, France
Raphaël Vernet
Université Paris Cité, Institut National de la Santé et de la Recherche Médicale (INSERM), UMR-1124, Group of Genomic Epidemiology of Multifactorial Diseases, Paris, France
Cyril Nerin
Institut Pasteur, Université Paris Cité, Department of Computational Biology, 75015 Paris, France
Christophe Boetto
Institut Pasteur, Université Paris Cité, Department of Computational Biology, 75015 Paris, France
Antoine Auvergne
Institut Pasteur, Université Paris Cité, Department of Computational Biology, 75015 Paris, France
Christophe Linhard
Université Paris Cité, Institut National de la Santé et de la Recherche Médicale (INSERM), UMR-1124, Group of Genomic Epidemiology of Multifactorial Diseases, Paris, France
Rachel Torchet
Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, 75015 Paris, France
Pierre Lechat
Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, 75015 Paris, France
Lucie Troubat
Institut Pasteur, Université Paris Cité, Department of Computational Biology, 75015 Paris, France
Michael H. Cho
Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, MA 02115, USA; Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Emmanuelle Bouzigon
Université Paris Cité, Institut National de la Santé et de la Recherche Médicale (INSERM), UMR-1124, Group of Genomic Epidemiology of Multifactorial Diseases, Paris, France
Hugues Aschard
Institut Pasteur, Université Paris Cité, Department of Computational Biology, 75015 Paris, France; Corresponding author
Hanna Julienne
Institut Pasteur, Université Paris Cité, Department of Computational Biology, 75015 Paris, France; Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, 75015 Paris, France; Corresponding author
Summary: Since the first genome-wide association studies (GWASs), thousands of variant-trait associations have been discovered. However, comprehensively mapping the genetic determinant of complex traits through univariate testing can require prohibitive sample sizes. Multi-trait GWAS can circumvent this issue and improve statistical power by leveraging the joint genetic architecture of human phenotypes. Although many methodological hurdles of multi-trait testing have been solved, the strategy to select traits has been overlooked. In this study, we conducted multi-trait GWAS on approximately 20,000 combinations of 72 traits using an omnibus test as implemented in the Joint Analysis of Summary Statistics. We assessed which genetic features of the sets of traits analyzed were associated with an increased detection of variants compared with univariate screening. Several features of the set of traits, including the heritability, the number of traits, and the genetic correlation, drive the multi-trait test gain. Using these features jointly in predictive models captures a large fraction of the power gain of the multi-trait test (Pearson’s r between the observed and predicted gain equals 0.43, p < 1.6 × 10−60). Applying an alternative multi-trait approach (Multi-Trait Analysis of GWAS), we identified similar features of interest, but with an overall 70% lower number of new associations. Finally, selecting sets based on our data-driven models systematically outperformed the common strategy of selecting clinically similar traits. This work provides a unique picture of the determinant of multi-trait GWAS statistical power and outlines practical strategies for multi-trait testing.