Cancer Cell International (Apr 2023)

Molecular subtypes predict therapeutic responses and identifying and validating diagnostic signatures based on machine learning in chronic myeloid leukemia

  • Fang-Min Zhong,
  • Fang-Yi Yao,
  • Yu-Lin Yang,
  • Jing Liu,
  • Mei-Yong Li,
  • Jun-Yao Jiang,
  • Nan Zhang,
  • Yan-Mei Xu,
  • Shu-Qi Li,
  • Ying Cheng,
  • Shuai Xu,
  • Bo Huang,
  • Xiao-Zhong Wang

DOI
https://doi.org/10.1186/s12935-023-02905-x
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Chronic myeloid leukemia (CML) is a hematological tumor derived from hematopoietic stem cells. The aim of this study is to analyze the biological characteristics and identify the diagnostic markers of CML. We obtained the expression profiles from the Gene Expression Omnibus (GEO) database and identified 210 differentially expressed genes (DEGs) between CML and normal samples. These DEGs are mainly enriched in immune-related pathways such as Th1 and Th2 cell differentiation, primary immunodeficiency, T cell receptor signaling pathway, antigen processing and presentation pathways. Based on these DEGs, we identified two molecular subtypes using a consensus clustering algorithm. Cluster A was an immunosuppressive phenotype with reduced immune cell infiltration and significant activation of metabolism-related pathways such as reactive oxygen species, glycolysis and mTORC1; Cluster B was an immune activating phenotype with increased infiltration of CD4 + and CD8 + T cells and NK cells, and increased activation of signaling pathways such as interferon gamma (IFN-γ) response, IL6-JAK-STAT3 and inflammatory response. Drug prediction results showed that patients in Cluster B had a higher therapeutic response to anti-PD-1 and anti-CTLA4 and were more sensitive to imatinib, nilotinib and dasatinib. Support Vector Machine Recursive Feature Elimination (SVM-RFE), Least Absolute Shrinkage Selection Operator (LASSO) and Random Forest (RF) algorithms identified 4 CML diagnostic genes (HDC, SMPDL3A, IRF4 and AQP3), and the risk score model constructed by these genes improved the diagnostic accuracy. We further validated the diagnostic value of the 4 genes and the risk score model in a clinical cohort, and the risk score can be used in the differential diagnosis of CML and other hematological malignancies. The risk score can also be used to identify molecular subtypes and predict response to imatinib treatment. These results reveal the characteristics of immunosuppression and metabolic reprogramming in CML patients, and the identification of molecular subtypes and biomarkers provides new ideas and insights for the clinical diagnosis and treatment.

Keywords