Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China
Lin Hou
Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China; MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
Yu Shi
Yale School of Management, Yale University, New Haven, United States
Sheng Chih Jin
Department of Genetics, Washington University in St. Louis, St. Louis, United States
Xue Zeng
Department of Genetics, Yale University, New Haven, United States; Laboratory of Human Genetics and Genomics, Rockefeller University, New York, United States
Boyang Li
Department of Biostatistics, Yale School of Public Health, New Haven, United States
Richard P Lifton
Department of Genetics, Yale University, New Haven, United States; Laboratory of Human Genetics and Genomics, Rockefeller University, New York, United States
Martina Brueckner
Department of Genetics, Yale University, New Haven, United States; Department of Pediatrics, Yale University, New Haven, United States
Hongyu Zhao
Department of Genetics, Yale University, New Haven, United States; Department of Biostatistics, Yale School of Public Health, New Haven, United States; Program of Computational Biology and Bioinformatics, Yale University, New Haven, United States
Exome sequencing on tens of thousands of parent-proband trios has identified numerous deleterious de novo mutations (DNMs) and implicated risk genes for many disorders. Recent studies have suggested shared genes and pathways are enriched for DNMs across multiple disorders. However, existing analytic strategies only focus on genes that reach statistical significance for multiple disorders and require large trio samples in each study. As a result, these methods are not able to characterize the full landscape of genetic sharing due to polygenicity and incomplete penetrance. In this work, we introduce EncoreDNM, a novel statistical framework to quantify shared genetic effects between two disorders characterized by concordant enrichment of DNMs in the exome. EncoreDNM makes use of exome-wide, summary-level DNM data, including genes that do not reach statistical significance in single-disorder analysis, to evaluate the overall and annotation-partitioned genetic sharing between two disorders. Applying EncoreDNM to DNM data of nine disorders, we identified abundant pairwise enrichment correlations, especially in genes intolerant to pathogenic mutations and genes highly expressed in fetal tissues. These results suggest that EncoreDNM improves current analytic approaches and may have broad applications in DNM studies.