PLoS ONE (Jan 2018)
Computational de novo discovery of distinguishing genes for biological processes and cell types in complex tissues.
Abstract
Bulk tissue samples examined by gene expression studies are usually heterogeneous. The data gained from these samples display the confounding patterns of mixtures consisting of multiple cell types or similar cell types in various functional states, which hinders the elucidation of the molecular mechanisms underlying complex biological phenomena. A realistic approach to compensate for the limitations of experimentally separating homogenous cell populations from mixed tissues is to computationally identify cell-type specific patterns from bulk, heterogeneous measurements. We designed the CellDistinguisher algorithm to analyze the gene expression data of mixed samples, identifying genes that best distinguish biological processes and cell types. Coupled with a deconvolution algorithm that takes cell type specific gene lists as input, we show that CellDistinguisher performs as well as partial deconvolution algorithms in predicting cell type composition without the need for prior knowledge of cell type signatures. This approach is also better in predicting cell type signatures than the one-step traditional complete deconvolution methods. To illustrate its wide applicability, the algorithm was tested on multiple publicly available data sets. In each case, CellDistinguisher identified genes reflecting biological processes typical for the tissues and development stages of interest and estimated the sample compositions accurately.