Вавиловский журнал генетики и селекции (May 2016)

The design of experiments for the transcriptome studies by high-throughput sequencing methods

  • P. N. Menshanov,
  • N. N. Dygalo

DOI
https://doi.org/10.18699/VJ16.148
Journal volume & issue
Vol. 20, no. 2
pp. 247 – 254

Abstract

Read online

The common questions in the design of the highthroughput sequencing experiments using RNA-Seq or Ribo-Seq methods are reviewed. The ENCODE guidelines (2011) as well as the recently published advances in the design of the studies of mammalian, animal and plant transcriptomes are also summarized in this review. The optimal limit of the sequencing depth does exist for the identification of almost all actively transcribed genes. This limit depends on the transcriptome size in the biological object studied. Additional sequencing does not provide any substantial additional information about the transcriptome complexity. For mammals, the optimal limit of the sequencing depth for the identification of the actively transcribed genes is equal to ~ 2 × 109 bp per biological sample. For other species, the optimal limit of the sequencing depth per biological sample is determined similarly for mammals; however, the transcriptome size and the mean RNA content in the studied object should be taken into account, in comparison to the mammalian transcriptomes. The discovery of differentially expressed genes, as well as the identification of splicing sites in the mRNA could be enhanced by increasing the number of biological samples analyzed per each experimental group. The minimal number of biological replicates per experimental group is equal to 2. However, the optimal number of biological replicates per experimental group is equal to 5–8 (similar to the experiments quantifying the expression of single genes by qRT-PCR). For the transcriptome studies, it is recommended to use the sequencing technologies that have the accuracy of sequencing ≥ 0.999 per bp. For RNASeq, it is also recommended to use the technologies that are able to produce reads equal to or larger than 75 bp, to minimize the cost of the effective identification of the sequences. The relative cost for the sequencing of the control samples could be reduced by increasing the number of experimental groups in the experiment or by combining several independent experiments with similar control groups. The present notes could be utilized during the design step in the experimental studies devoted to the research of transcriptomes.

Keywords