PLoS Genetics (Jan 2013)

Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes.

  • Xin He,
  • Stephan J Sanders,
  • Li Liu,
  • Silvia De Rubeis,
  • Elaine T Lim,
  • James S Sutcliffe,
  • Gerard D Schellenberg,
  • Richard A Gibbs,
  • Mark J Daly,
  • Joseph D Buxbaum,
  • Matthew W State,
  • Bernie Devlin,
  • Kathryn Roeder

DOI
https://doi.org/10.1371/journal.pgen.1003671
Journal volume & issue
Vol. 9, no. 8
p. e1003671

Abstract

Read online

De novo mutations affect risk for many diseases and disorders, especially those with early-onset. An example is autism spectrum disorders (ASD). Four recent whole-exome sequencing (WES) studies of ASD families revealed a handful of novel risk genes, based on independent de novo loss-of-function (LoF) mutations falling in the same gene, and found that de novo LoF mutations occurred at a twofold higher rate than expected by chance. However successful these studies were, they used only a small fraction of the data, excluding other types of de novo mutations and inherited rare variants. Moreover, such analyses cannot readily incorporate data from case-control studies. An important research challenge in gene discovery, therefore, is to develop statistical methods that accommodate a broader class of rare variation. We develop methods that can incorporate WES data regarding de novo mutations, inherited variants present, and variants identified within cases and controls. TADA, for Transmission And De novo Association, integrates these data by a gene-based likelihood model involving parameters for allele frequencies and gene-specific penetrances. Inference is based on a Hierarchical Bayes strategy that borrows information across all genes to infer parameters that would be difficult to estimate for individual genes. In addition to theoretical development we validated TADA using realistic simulations mimicking rare, large-effect mutations affecting risk for ASD and show it has dramatically better power than other common methods of analysis. Thus TADA's integration of various kinds of WES data can be a highly effective means of identifying novel risk genes. Indeed, application of TADA to WES data from subjects with ASD and their families, as well as from a study of ASD subjects and controls, revealed several novel and promising ASD candidate genes with strong statistical support.