Genome Biology (Sep 2024)

RNAseqCovarImpute: a multiple imputation procedure that outperforms complete case and single imputation differential expression analysis

  • Brennan H. Baker,
  • Sheela Sathyanarayana,
  • Adam A. Szpiro,
  • James W. MacDonald,
  • Alison G. Paquette

DOI
https://doi.org/10.1186/s13059-024-03376-7
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 25

Abstract

Read online

Abstract Missing covariate data is a common problem that has not been addressed in observational studies of gene expression. Here, we present a multiple imputation method that accommodates high dimensional gene expression data by incorporating principal component analysis of the transcriptome into the multiple imputation prediction models to avoid bias. Simulation studies using three datasets show that this method outperforms complete case and single imputation analyses at uncovering true positive differentially expressed genes, limiting false discovery rates, and minimizing bias. This method is easily implemented via an R Bioconductor package, RNAseqCovarImpute that integrates with the limma-voom pipeline for differential expression analysis.

Keywords