F1000Research (Apr 2014)

shRNA-seq data analysis with edgeR [v1; ref status: indexed, http://f1000r.es/38s]

  • Zhiyin Dai,
  • Julie M. Sheridan,
  • Linden J. Gearing,
  • Darcy L. Moore,
  • Shian Su,
  • Ross A. Dickins,
  • Marnie E. Blewitt,
  • Matthew E. Ritchie

DOI
https://doi.org/10.12688/f1000research.3928.1
Journal volume & issue
Vol. 3

Abstract

Read online

Pooled short hairpin RNA sequencing (shRNA-seq) screens are becoming increasingly popular in functional genomics research, and there is a need to establish optimal analysis tools to handle such data. Our open-source shRNA processing pipeline in edgeR provides a complete analysis solution for shRNA-seq screen data, that begins with the raw sequence reads and ends with a ranked lists of candidate shRNAs for downstream biological validation. We first summarize the raw data contained in a fastq file into a matrix of counts (samples in the columns, hairpins in the rows) with options for allowing mismatches and small shifts in hairpin position. Diagnostic plots, normalization and differential representation analysis can then be performed using established methods to prioritize results in a statistically rigorous way, with the choice of either the classic exact testing methodology or a generalized linear modelling that can handle complex experimental designs. A detailed users’ guide that demonstrates how to analyze screen data in edgeR along with a point-and-click implementation of this workflow in Galaxy are also provided. The edgeR package is freely available from http://www.bioconductor.org.

Keywords