Discover Oncology (Sep 2023)

Integrating TCGA and single-cell sequencing data for colorectal cancer: a 10-gene prognostic risk assessment model

  • Di Lu,
  • Xiaofang Li,
  • Yuan Yuan,
  • Yaqi Li,
  • Jiannan Wang,
  • Qian Zhang,
  • Zhiyu Yang,
  • Shanjun Gao,
  • Xiulei Zhang,
  • Bingxi Zhou

DOI
https://doi.org/10.1007/s12672-023-00789-x
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Colorectal cancer represents a significant health threat, yet a standardized method for early clinical assessment and prognosis remains elusive. This study sought to address this gap by using the Seurat package to analyze a single-cell sequencing dataset (GSE178318) of colorectal cancer, thereby identifying distinctive marker genes characterizing various cell subpopulations. Through CIBERSORT analysis of colorectal cancer data within The Cancer Genome Atlas (TCGA) database, significant differences existed in both cell subpopulations and prognostic values. Employing WGCNA, we pinpointed modules exhibiting strong correlations with these subpopulations, subsequently utilizing the survival package coxph to isolate genes within these modules. Further stratification of TCGA dataset based on these selected genes brought to light notable variations between subtypes. The prognostic relevance of these differentially expressed genes was rigorously assessed through survival analysis, with LASSO regression employed for modeling prognostic factors. Our resulting model, anchored by a 10-gene signature originating from these differentially expressed genes and LASSO regression, proved adept at accurately predicting clinical prognoses, even when tested against external datasets. Specifically, natural killer cells from the C7 subpopulation were found to bear significant associations with colorectal cancer survival and prognosis, as observed within the TCGA database. These findings underscore the promise of an integrated 10-gene signature prognostic risk assessment model, harmonizing single-cell sequencing insights with TCGA data, for effectively estimating the risk associated with colorectal cancer.

Keywords