Patterns (Jun 2020)

A Random Matrix Theory Approach to Denoise Single-Cell Data

  • Luis Aparicio,
  • Mykola Bordyuh,
  • Andrew J. Blumberg,
  • Raul Rabadan

Journal volume & issue
Vol. 1, no. 3
p. 100035

Abstract

Read online

Summary: Single-cell technologies provide the opportunity to identify new cellular states. However, a major obstacle to the identification of biological signals is noise in single-cell data. In addition, single-cell data are very sparse. We propose a new method based on random matrix theory to analyze and denoise single-cell sequencing data. The method uses the universal distributions predicted by random matrix theory for the eigenvalues and eigenvectors of random covariance/Wishart matrices to distinguish noise from signal. In addition, we explain how sparsity can cause spurious eigenvector localization, falsely identifying meaningful directions in the data. We show that roughly 95% of the information in single-cell data is compatible with the predictions of random matrix theory, about 3% is spurious signal induced by sparsity, and only the last 2% reflects true biological signal. We demonstrate the effectiveness of our approach by comparing with alternative techniques in a variety of examples with marked cell populations. The Bigger Picture: Single-cell technologies are able to capture information of a biological system cell by cell. Such a level of precision is changing the way we understand complex systems such as cancer or the immune system. However, a major challenge in studying single-cell systems and their underlying biological phenomena is their inherently noisy nature due to their complexity. Random matrix theory is a field with many applications in different branches of mathematics and physics. In the words of one of its developers, the theoretical physicist Freeman Dyson, it describes a “black box in which a large number of particles are interacting according to unknown laws.” A complex system with a large number of components (such as genes, biomolecules, or cells) interacting according to unknown laws is the epitome of systems biology. Therefore, random matrix theory looks like a suitable framework to mathematically describe the noise and complexity of gene-cell expression data coming from single-cell biology.

Keywords