Powerful eQTL mapping through low-coverage RNA sequencing
Tommer Schwarz,
Toni Boltz,
Kangcheng Hou,
Merel Bot,
Chenda Duan,
Loes Olde Loohuis,
Marco P. Boks,
René S. Kahn,
Roel A. Ophoff,
Bogdan Pasaniuc
Affiliations
Tommer Schwarz
Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; Corresponding author
Toni Boltz
Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
Kangcheng Hou
Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
Merel Bot
Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, CA, USA
Chenda Duan
Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
Loes Olde Loohuis
Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Program in Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Marco P. Boks
Department of Psychiatry, Brain Center University Medical Center Utrecht, University Utrecht, Utrecht, the Netherlands
René S. Kahn
Department of Psychiatry, Brain Center University Medical Center Utrecht, University Utrecht, Utrecht, the Netherlands; Department of Psychiatry, Icahn School of Medicine, Mount Sinai, NY, USA
Roel A. Ophoff
Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; Department of Psychiatry, Erasmus University Medical Center, Rotterdam, the Netherlands
Bogdan Pasaniuc
Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; Corresponding author
Summary: Mapping genetic variants that regulate gene expression (eQTL mapping) in large-scale RNA sequencing (RNA-seq) studies is often employed to understand functional consequences of regulatory variants. However, the high cost of RNA-seq limits sample size, sequencing depth, and, therefore, discovery power in eQTL studies. In this work, we demonstrate that, given a fixed budget, eQTL discovery power can be increased by lowering the sequencing depth per sample and increasing the number of individuals sequenced in the assay. We perform RNA-seq of whole-blood tissue across 1,490 individuals at low coverage (5.9 million reads/sample) and show that the effective power is higher than that of an RNA-seq study of 570 individuals at moderate coverage (13.9 million reads/sample). Next, we leverage synthetic datasets derived from real RNA-seq data (50 million reads/sample) to explore the interplay of coverage and number individuals in eQTL studies, and show that a 10-fold reduction in coverage leads to only a 2.5-fold reduction in statistical power to identify eQTLs. Our work suggests that lowering coverage while increasing the number of individuals in RNA-seq is an effective approach to increase discovery power in eQTL studies.