shRNA-seq data analysis with edgeR [v1; ref status: indexed, http://f1000r.es/38s]

Zhiyin Dai; Julie M. Sheridan; Linden J. Gearing; Darcy L. Moore; Shian Su; Ross A. Dickins; Marnie E. Blewitt; Matthew E. Ritchie

doi:10.12688/f1000research.3928.1

F1000Research (Apr 2014)

shRNA-seq data analysis with edgeR [v1; ref status: indexed, http://f1000r.es/38s]

Zhiyin Dai,
Julie M. Sheridan,
Linden J. Gearing,
Darcy L. Moore,
Shian Su,
Ross A. Dickins,
Marnie E. Blewitt,
Matthew E. Ritchie

Affiliations

Zhiyin Dai: Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
Julie M. Sheridan: Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
Linden J. Gearing: Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
Darcy L. Moore: Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
Shian Su: Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
Ross A. Dickins: Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
Marnie E. Blewitt: Department of Medical Biology, The University of Melbourne, Parkville, Victoria, 3010, Australia
Matthew E. Ritchie: Department of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, 3010, Australia

DOI: https://doi.org/10.12688/f1000research.3928.1
Journal volume & issue: Vol. 3

Abstract

Read online

Pooled short hairpin RNA sequencing (shRNA-seq) screens are becoming increasingly popular in functional genomics research, and there is a need to establish optimal analysis tools to handle such data. Our open-source shRNA processing pipeline in edgeR provides a complete analysis solution for shRNA-seq screen data, that begins with the raw sequence reads and ends with a ranked lists of candidate shRNAs for downstream biological validation. We first summarize the raw data contained in a fastq file into a matrix of counts (samples in the columns, hairpins in the rows) with options for allowing mismatches and small shifts in hairpin position. Diagnostic plots, normalization and differential representation analysis can then be performed using established methods to prioritize results in a statistically rigorous way, with the choice of either the classic exact testing methodology or a generalized linear modelling that can handle complex experimental designs. A detailed users’ guide that demonstrates how to analyze screen data in edgeR along with a point-and-click implementation of this workflow in Galaxy are also provided. The edgeR package is freely available from http://www.bioconductor.org.

Bioinformatics

Published in F1000Research

ISSN: 2046-1402 (Online)
Publisher: F1000 Research Ltd
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://f1000research.com

About the journal

Abstract

Keywords