Genome Biology (Oct 2024)
Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples
Abstract
Abstract A recent study found severely inflated type I error rates for DESeq2 and edgeR, two dominant tools used for differential expression analysis of RNA-seq data. Here, we show that by properly addressing the outliers in the RNA-Seq data using winsorization, the type I error rate of DESeq2 and edgeR can be substantially reduced, and the power is comparable to Wilcoxon rank-sum test for large datasets. Therefore, as an alternative to Wilcoxon rank-sum test, they may still be applied for differential expression analysis of large RNA-Seq datasets.