Patterns (May 2024)

DeepDecon accurately estimates cancer cell fractions in bulk RNA-seq data

  • Jiawei Huang,
  • Yuxuan Du,
  • Andres Stucky,
  • Kevin R. Kelly,
  • Jiang F. Zhong,
  • Fengzhu Sun

Journal volume & issue
Vol. 5, no. 5
p. 100969

Abstract

Read online

Summary: Understanding the cellular composition of a disease-related tissue is important in disease diagnosis, prognosis, and downstream treatment. Recent advances in single-cell RNA-sequencing (scRNA-seq) technique have allowed the measurement of gene expression profiles for individual cells. However, scRNA-seq is still too expensive to be used for large-scale population studies, and bulk RNA-seq is still widely used in such situations. An essential challenge is to deconvolve cellular composition for bulk RNA-seq data based on scRNA-seq data. Here, we present DeepDecon, a deep neural network model that leverages single-cell gene expression information to accurately predict the fraction of cancer cells in bulk tissues. It provides a refining strategy in which the cancer cell fraction is iteratively estimated by a set of trained models. When applied to simulated and real cancer data, DeepDecon exhibits superior performance compared to existing decomposition methods in terms of accuracy. The bigger picture: Estimating the malignant cell fraction accurately and cheaply is essential for cancer diagnosis and prognosis. Although single-cell RNA sequencing (scRNA-seq) can provide accurate information on malignant cell fraction, it is too labor intensive and expensive for clinical application. Bulk RNA-seq, on the other hand, is cost effective and widely used in clinical settings but traditionally only provides the average gene expression profiles of a cancer cell population. Using reference malignant and normal scRNA-seq data, DeepDecon provides an iterative deep-learning-based computational method for accurate estimation of the fraction of malignant cells based on the bulk-averaged gene expression profiles. This study used DeepDecon to accurately estimate acute myeloid leukemia (AML), neuroblastoma, and head-and-neck squamous cell carcinoma (HNSCC) cell fractions from bulk RNA-seq data.

Keywords