Genome Biology (Jan 2020)

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

  • Koki Tsuyuzaki,
  • Hiroyuki Sato,
  • Kenta Sato,
  • Itoshi Nikaido

DOI
https://doi.org/10.1186/s13059-019-1900-3
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory. Results In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms. Conclusion We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.

Keywords