Patterns (May 2024)
DeepDecon accurately estimates cancer cell fractions in bulk RNA-seq data
Abstract
Summary: Understanding the cellular composition of a disease-related tissue is important in disease diagnosis, prognosis, and downstream treatment. Recent advances in single-cell RNA-sequencing (scRNA-seq) technique have allowed the measurement of gene expression profiles for individual cells. However, scRNA-seq is still too expensive to be used for large-scale population studies, and bulk RNA-seq is still widely used in such situations. An essential challenge is to deconvolve cellular composition for bulk RNA-seq data based on scRNA-seq data. Here, we present DeepDecon, a deep neural network model that leverages single-cell gene expression information to accurately predict the fraction of cancer cells in bulk tissues. It provides a refining strategy in which the cancer cell fraction is iteratively estimated by a set of trained models. When applied to simulated and real cancer data, DeepDecon exhibits superior performance compared to existing decomposition methods in terms of accuracy. The bigger picture: Estimating the malignant cell fraction accurately and cheaply is essential for cancer diagnosis and prognosis. Although single-cell RNA sequencing (scRNA-seq) can provide accurate information on malignant cell fraction, it is too labor intensive and expensive for clinical application. Bulk RNA-seq, on the other hand, is cost effective and widely used in clinical settings but traditionally only provides the average gene expression profiles of a cancer cell population. Using reference malignant and normal scRNA-seq data, DeepDecon provides an iterative deep-learning-based computational method for accurate estimation of the fraction of malignant cells based on the bulk-averaged gene expression profiles. This study used DeepDecon to accurately estimate acute myeloid leukemia (AML), neuroblastoma, and head-and-neck squamous cell carcinoma (HNSCC) cell fractions from bulk RNA-seq data.