Biological Procedures Online (Sep 2022)

Single-cell and WGCNA uncover a prognostic model and potential oncogenes in colorectal cancer

  • Ziyang Di,
  • Sicheng Zhou,
  • Gaoran Xu,
  • Lian Ren,
  • Chengxin Li,
  • Zheyu Ding,
  • Kaixin Huang,
  • Leilei Liang,
  • Yihang Yuan

DOI
https://doi.org/10.1186/s12575-022-00175-x
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Colorectal cancer (CRC) is one of the leading causes of cancer-related death worldwide. Single-cell transcriptome sequencing (scRNA-seq) can provide accurate gene expression data for individual cells. In this study, a new prognostic model was constructed by scRNA-seq and bulk transcriptome sequencing (bulk RNA-seq) data of CRC samples to develop a new understanding of CRC. Methods CRC scRNA-seq data were downloaded from the GSE161277 database, and CRC bulk RNA-seq data were downloaded from the TCGA and GSE17537 databases. The cells were clustered by the FindNeighbors and FindClusters functions in scRNA-seq data. CIBERSORTx was applied to detect the abundance of cell clusters in the bulk RNA-seq expression matrix. WGCNA was performed with the expression profiles to construct the gene coexpression networks of TCGA-CRC. Next, we used a tenfold cross test to construct the model and a nomogram to assess the independence of the model for clinical application. Finally, we examined the expression of the unreported model genes by qPCR and immunohistochemistry. A clone formation assay and orthotopic colorectal tumour model were applied to detect the regulatory roles of unreported model genes. Results A total of 43,851 cells were included after quality control, and 20 cell clusters were classified by the FindCluster () function. We found that the abundances of C1, C2, C4, C5, C15, C16 and C19 were high and the abundances of C7, C10, C11, C13, C14 and C17 were low in CRC tumour tissues. Meanwhile, the results of survival analysis showed that high abundances of C4, C11 and C13 and low abundances of C5 and C14 were associated with better survival. The WGCNA results showed that the red module was most related to the tumour and the C14 cluster, which contains 615 genes. Lasso Cox regression analysis revealed 8 genes (PBXIP1, MPMZ, SCARA3, INA, ILK, MPP2, L1CAM and FLNA), which were chosen to construct a risk model. In the model, the risk score features had the greatest impact on survival prediction, indicating that the 8-gene risk model can better predict prognosis. qPCR and immunohistochemistry analysis showed that the expression levels of MPZ, SCARA3, MPP2 and PBXIP1 were high in CRC tissues. The functional experiment results indicated that MPZ, SCARA3, MPP2 and PBXIP1 could promote the colony formation ability of CRC cells in vitro and tumorigenicity in vivo. Conclusions We constructed a risk model to predict the prognosis of CRC patients based on scRNA-seq and bulk RNA-seq data, which could be used for clinical application. We also identified 4 previously unreported model genes (MPZ, SCARA3, MPP2 and PBXIP1) as novel oncogenes in CRC. These results suggest that this model could potentially be used to evaluate the prognostic risk and provide potential therapeutic targets for CRC patients.

Keywords