Clinical and Translational Science (Oct 2022)

A novel analytical framework for risk stratification of real‐world data using machine learning: A small cell lung cancer study

  • Luca Marzano,
  • Adam S. Darwich,
  • Salomon Tendler,
  • Asaf Dan,
  • Rolf Lewensohn,
  • Luigi De Petris,
  • Jayanth Raghothama,
  • Sebastiaan Meijer

DOI
https://doi.org/10.1111/cts.13371
Journal volume & issue
Vol. 15, no. 10
pp. 2437 – 2447

Abstract

Read online

Abstract In recent studies, small cell lung cancer (SCLC) treatment guidelines based on Veterans’ Administration Lung Study Group limited/extensive disease staging and resulted in broad and inseparable prognostic subgroups. Evidence suggests that the eight versions of tumor, node, and metastasis (TNM) staging can play an important role to address this issue. The aim of the present study was to improve the detection of prognostic subgroups from a real‐word data (RWD) cohort of patients and analyze their patterns using a development pipeline with thoracic oncologists and machine learning methods. The method detected subgroups of patients informing unsupervised learning (partition around medoids) including the impact of covariates on prognosis (Cox regression and random survival forest). An analysis was carried out using patients with SCLC (n = 636) with stage IIIA–IVB according to TNM classification. The analysis yielded k = 7 compacted and well‐separated clusters of patients. Performance status (Eastern Cooperative Oncology Group‐Performance Status), lactate dehydrogenase, spreading of metastasis, cancer stage, and CRP were the baselines that characterized the subgroups. The selected clustering method outperformed standard clustering techniques, which were not capable of detecting meaningful subgroups. From the analysis of cluster treatment decisions, we showed the potential of future RWD applications to understand disease, develop individualized therapies, and improve healthcare decision making.