Genome Biology (Feb 2018)
ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues
- Serghei Mangul,
- Harry Taegyun Yang,
- Nicolas Strauli,
- Franziska Gruhl,
- Hagit T. Porath,
- Kevin Hsieh,
- Linus Chen,
- Timothy Daley,
- Stephanie Christenson,
- Agata Wesolowska-Andersen,
- Roberto Spreafico,
- Cydney Rios,
- Celeste Eng,
- Andrew D. Smith,
- Ryan D. Hernandez,
- Roel A. Ophoff,
- Jose Rodriguez Santana,
- Erez Y. Levanon,
- Prescott G. Woodruff,
- Esteban Burchard,
- Max A. Seibold,
- Sagiv Shifman,
- Eleazar Eskin,
- Noah Zaitlen
Affiliations
- Serghei Mangul
- Department of Computer Science, University of California
- Harry Taegyun Yang
- Department of Computer Science, University of California
- Nicolas Strauli
- Biomedical Sciences Graduate Program, University of California
- Franziska Gruhl
- Center for Integrative Genomics, University of Lausanne
- Hagit T. Porath
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University
- Kevin Hsieh
- Department of Computer Science, University of California
- Linus Chen
- Department of Bioengineering, University of California
- Timothy Daley
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California
- Stephanie Christenson
- Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, and Cardiovascular Research Institute, University of California
- Agata Wesolowska-Andersen
- Center for Genes, Environment, and Health, National Jewish Health
- Roberto Spreafico
- Institute for Quantitative and Computational Biosciences, University of California
- Cydney Rios
- Center for Genes, Environment, and Health, National Jewish Health
- Celeste Eng
- Department of Medicine, University of California
- Andrew D. Smith
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California
- Ryan D. Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California
- Roel A. Ophoff
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California
- Jose Rodriguez Santana
- Centro de Neumología Pediátrica
- Erez Y. Levanon
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University
- Prescott G. Woodruff
- Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, and Cardiovascular Research Institute, University of California
- Esteban Burchard
- Schools of Pharmacy and Medicine, Department of Bioengineering and Therapeutic Sciences, University of California
- Max A. Seibold
- Department of Pediatrics, National Jewish Health
- Sagiv Shifman
- Department of Genetics, The Institute of Life Sciences, The Hebrew University of Jerusalem
- Eleazar Eskin
- Department of Computer Science, University of California
- Noah Zaitlen
- Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, and Cardiovascular Research Institute, University of California
- DOI
- https://doi.org/10.1186/s13059-018-1403-7
- Journal volume & issue
-
Vol. 19,
no. 1
pp. 1 – 12
Abstract
Abstract High-throughput RNA-sequencing (RNA-seq) technologies provide an unprecedented opportunity to explore the individual transcriptome. Unmapped reads are a large and often overlooked output of standard RNA-seq analyses. Here, we present Read Origin Protocol (ROP), a tool for discovering the source of all reads originating from complex RNA molecules. We apply ROP to samples across 2630 individuals from 54 diverse human tissues. Our approach can account for 99.9% of 1 trillion reads of various read length. Additionally, we use ROP to investigate the functional mechanisms underlying connections between the immune system, microbiome, and disease. ROP is freely available at https://github.com/smangul1/rop/wiki.