World Journal of Surgical Oncology (Jun 2024)

Decoding temporal heterogeneity in NSCLC through machine learning and prognostic model construction

  • Junpeng Cheng,
  • Meizhu Xiao,
  • Qingkang Meng,
  • Min Zhang,
  • Denan Zhang,
  • Lei Liu,
  • Qing Jin,
  • Zhijin Fu,
  • Yanjiao Li,
  • Xiujie Chen,
  • Hongbo Xie

DOI
https://doi.org/10.1186/s12957-024-03435-0
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Background Non-small cell lung cancer (NSCLC) is a prevalent and heterogeneous disease with significant genomic variations between the early and advanced stages. The identification of key genes and pathways driving NSCLC tumor progression is critical for improving the diagnosis and treatment outcomes of this disease. Methods In this study, we conducted single-cell transcriptome analysis on 93,406 cells from 22 NSCLC patients to characterize malignant NSCLC cancer cells. Utilizing cNMF, we classified these cells into distinct modules, thus identifying the diverse molecular profiles within NSCLC. Through pseudotime analysis, we delineated temporal gene expression changes during NSCLC evolution, thus demonstrating genes associated with disease progression. Using the XGBoost model, we assessed the significance of these genes in the pseudotime trajectory. Our findings were validated by using transcriptome sequencing data from The Cancer Genome Atlas (TCGA), supplemented via LASSO regression to refine the selection of characteristic genes. Subsequently, we established a risk score model based on these genes, thus providing a potential tool for cancer risk assessment and personalized treatment strategies. Results We used cNMF to classify malignant NSCLC cells into three functional modules, including the metabolic reprogramming module, cell cycle module, and cell stemness module, which can be used for the functional classification of malignant tumor cells in NSCLC. These findings also indicate that metabolism, the cell cycle, and tumor stemness play important driving roles in the malignant evolution of NSCLC. We integrated cNMF and XGBoost to select marker genes that are indicative of both early and advanced NSCLC stages. The expression of genes such as CHCHD2, GAPDH, and CD24 was strongly correlated with the malignant evolution of NSCLC at the single-cell data level. These genes have been validated via histological data. The risk score model that we established (represented by eight genes) was ultimately validated with GEO data. Conclusion In summary, our study contributes to the identification of temporal heterogeneous biomarkers in NSCLC, thus offering insights into disease progression mechanisms and potential therapeutic targets. The developed workflow demonstrates promise for future applications in clinical practice.

Keywords