Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido

doi:10.1186/s13059-019-1900-3

Genome Biology (Jan 2020)

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Koki Tsuyuzaki,
Hiroyuki Sato,
Kenta Sato,
Itoshi Nikaido

Affiliations

Koki Tsuyuzaki: Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research
Hiroyuki Sato: Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University
Kenta Sato: Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research
Itoshi Nikaido: Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research

DOI: https://doi.org/10.1186/s13059-019-1900-3
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory. Results In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms. Conclusion We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.

Published in Genome Biology

ISSN: 1474-760X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://genomebiology.biomedcentral.com/

About the journal

Abstract

Keywords