Department of Biomedical Informatics, Harvard Medical School, Boston, United States; Brigham and Women’s Hospital, Division of Genetics, Harvard Medical School, Boston, United States; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
Sumaiya Nazeen
Department of Biomedical Informatics, Harvard Medical School, Boston, United States; Brigham and Women’s Hospital, Division of Genetics, Harvard Medical School, Boston, United States; Brigham and Women’s Hospital, Department of Neurology, Harvard Medical School, Boston, United States
Daniel Lee
Department of Biomedical Informatics, Harvard Medical School, Boston, United States; Brigham and Women’s Hospital, Division of Genetics, Harvard Medical School, Boston, United States; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
Huwenbo Shi
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, United States
John Stamatoyannopoulos
Altius Institute, Seattle, United States
Sung Chun
Division of Pulmonary Medicine, Boston Children’s Hospital, Boston, United States
Chris Cotsapas
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States; Department of Neurology, Yale Medical School, New Haven, United States; Department of Genetics, Yale Medical School, New Haven, United States
Christopher A Cassa
Brigham and Women’s Hospital, Division of Genetics, Harvard Medical School, Boston, United States; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
Department of Biomedical Informatics, Harvard Medical School, Boston, United States; Brigham and Women’s Hospital, Division of Genetics, Harvard Medical School, Boston, United States; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
The genetic basis of most traits is highly polygenic and dominated by non-coding alleles. It is widely assumed that such alleles exert small regulatory effects on the expression of cis-linked genes. However, despite the availability of gene expression and epigenomic datasets, few variant-to-gene links have emerged. It is unclear whether these sparse results are due to limitations in available data and methods, or to deficiencies in the underlying assumed model. To better distinguish between these possibilities, we identified 220 gene–trait pairs in which protein-coding variants influence a complex trait or its Mendelian cognate. Despite the presence of expression quantitative trait loci near most GWAS associations, by applying a gene-based approach we found limited evidence that the baseline expression of trait-related genes explains GWAS associations, whether using colocalization methods (8% of genes implicated), transcription-wide association (2% of genes implicated), or a combination of regulatory annotations and distance (4% of genes implicated). These results contradict the hypothesis that most complex trait-associated variants coincide with homeostatic expression QTLs, suggesting that better models are needed. The field must confront this deficit and pursue this ‘missing regulation.’