A systematic evaluation of single-cell RNA-sequencing imputation methods

Wenpin Hou; Zhicheng Ji; Hongkai Ji; Stephanie C. Hicks

doi:10.1186/s13059-020-02132-x

Genome Biology (Aug 2020)

A systematic evaluation of single-cell RNA-sequencing imputation methods

Wenpin Hou,
Zhicheng Ji,
Hongkai Ji,
Stephanie C. Hicks

Affiliations

Wenpin Hou: Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Zhicheng Ji: Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Hongkai Ji: Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Stephanie C. Hicks: Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health

DOI: https://doi.org/10.1186/s13059-020-02132-x
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 30

Abstract

Read online

Abstract Background The rapid development of single-cell RNA-sequencing (scRNA-seq) technologies has led to the emergence of many methods for removing systematic technical noises, including imputation methods, which aim to address the increased sparsity observed in single-cell data. Although many imputation methods have been developed, there is no consensus on how methods compare to each other. Results Here, we perform a systematic evaluation of 18 scRNA-seq imputation methods to assess their accuracy and usability. We benchmark these methods in terms of the similarity between imputed cell profiles and bulk samples and whether these methods recover relevant biological signals or introduce spurious noise in downstream differential expression, unsupervised clustering, and pseudotemporal trajectory analyses, as well as their computational run time, memory usage, and scalability. Methods are evaluated using data from both cell lines and tissues and from both plate- and droplet-based single-cell platforms. Conclusions We found that the majority of scRNA-seq imputation methods outperformed no imputation in recovering gene expression observed in bulk RNA-seq. However, the majority of the methods did not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. In addition, we found substantial variability in the performance of the methods within each evaluation aspect. Overall, MAGIC, kNN-smoothing, and SAVER were found to outperform the other methods most consistently.

Published in Genome Biology

ISSN: 1474-760X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://genomebiology.biomedcentral.com/

About the journal

Abstract

Keywords