Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size

Sophie Lamarre; Pierre Frasse; Mohamed Zouine; Delphine Labourdette; Elise Sainderichin; Guojian Hu; Véronique Le Berre-Anton; Mondher Bouzayen; Elie Maza

doi:10.3389/fpls.2018.00108

Frontiers in Plant Science (Feb 2018)

Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size

Sophie Lamarre,
Pierre Frasse,
Mohamed Zouine,
Delphine Labourdette,
Elise Sainderichin,
Guojian Hu,
Véronique Le Berre-Anton,
Mondher Bouzayen,
Elie Maza

Affiliations

Sophie Lamarre: LISBP, Centre National de la Recherche Scientifique, INRA, INSA, Université de Toulouse, Toulouse, France
Pierre Frasse: GBF, Université de Toulouse, INRA, Castanet-Tolosan, France
Mohamed Zouine: GBF, Université de Toulouse, INRA, Castanet-Tolosan, France
Delphine Labourdette: LISBP, Centre National de la Recherche Scientifique, INRA, INSA, Université de Toulouse, Toulouse, France
Elise Sainderichin: GBF, Université de Toulouse, INRA, Castanet-Tolosan, France
Guojian Hu: GBF, Université de Toulouse, INRA, Castanet-Tolosan, France
Véronique Le Berre-Anton: LISBP, Centre National de la Recherche Scientifique, INRA, INSA, Université de Toulouse, Toulouse, France
Mondher Bouzayen: GBF, Université de Toulouse, INRA, Castanet-Tolosan, France
Elie Maza: GBF, Université de Toulouse, INRA, Castanet-Tolosan, France

DOI: https://doi.org/10.3389/fpls.2018.00108
Journal volume & issue: Vol. 9

Abstract

Read online

RNA-Seq is a widely used technology that allows an efficient genome-wide quantification of gene expressions for, for example, differential expression (DE) analysis. After a brief review of the main issues, methods and tools related to the DE analysis of RNA-Seq data, this article focuses on the impact of both the replicate number and library size in such analyses. While the main drawback of previous relevant studies is the lack of generality, we conducted both an analysis of a two-condition experiment (with eight biological replicates per condition) to compare the results with previous benchmark studies, and a meta-analysis of 17 experiments with up to 18 biological conditions, eight biological replicates and 100 million (M) reads per sample. As a global trend, we concluded that the replicate number has a larger impact than the library size on the power of the DE analysis, except for low-expressed genes, for which both parameters seem to have the same impact. Our study also provides new insights for practitioners aiming to enhance their experimental designs. For instance, by analyzing both the sensitivity and specificity of the DE analysis, we showed that the optimal threshold to control the false discovery rate (FDR) is approximately 2−r, where r is the replicate number. Furthermore, we showed that the false positive rate (FPR) is rather well controlled by all three studied R packages: DESeq, DESeq2, and edgeR. We also analyzed the impact of both the replicate number and library size on gene ontology (GO) enrichment analysis. Interestingly, we concluded that increases in the replicate number and library size tend to enhance the sensitivity and specificity, respectively, of the GO analysis. Finally, we recommend to RNA-Seq practitioners the production of a pilot data set to strictly analyze the power of their experimental design, or the use of a public data set, which should be similar to the data set they will obtain. For individuals working on tomato research, on the basis of the meta-analysis, we recommend at least four biological replicates per condition and 20 M reads per sample to be almost sure of obtaining about 1000 DE genes if they exist.

Published in Frontiers in Plant Science

ISSN: 1664-462X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Agriculture: Plant culture
Website: https://www.frontiersin.org/journals/plant-science

About the journal

Abstract

Keywords