Machine learning identifies prognostic subtypes of the tumor microenvironment of NSCLC

Duo Yu; Michael J. Kane; Eugene J. Koay; Ignacio I. Wistuba; Brian P. Hobbs

doi:10.1038/s41598-024-64977-7

Scientific Reports (Jul 2024)

Machine learning identifies prognostic subtypes of the tumor microenvironment of NSCLC

Duo Yu,
Michael J. Kane,
Eugene J. Koay,
Ignacio I. Wistuba,
Brian P. Hobbs

Affiliations

Duo Yu: Division of Biostatistics, Institute for Health & Equity, Medical College of Wisconsin
Michael J. Kane: Department of Biostatistics, Yale School of Public Health
Eugene J. Koay: Department Radiation Oncology, The University of Texas MD Anderson Cancer Center
Ignacio I. Wistuba: Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center
Brian P. Hobbs: Department of Population Health, Dell Medical School, The University of Texas at Austin

DOI: https://doi.org/10.1038/s41598-024-64977-7
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 9

Abstract

Read online

Abstract The tumor microenvironment (TME) plays a fundamental role in tumorigenesis, tumor progression, and anti-cancer immunity potential of emerging cancer therapeutics. Understanding inter-patient TME heterogeneity, however, remains a challenge to efficient drug development. This article applies recent advances in machine learning (ML) for survival analysis to a retrospective study of NSCLC patients who received definitive surgical resection and immune pathology following surgery. ML methods are compared for their effectiveness in identifying prognostic subtypes. Six survival models, including Cox regression and five survival machine learning methods, were calibrated and applied to predict survival for NSCLC patients based on PD-L1 expression, CD3 expression, and ten baseline patient characteristics. Prognostic subregions of the biomarker space are delineated for each method using synthetic patient data augmentation and compared between models for overall survival concordance. A total of 423 NSCLC patients (46% female; median age [inter quantile range]: 67 [60–73]) treated with definite surgical resection were included in the study. And 219 (52%) patients experienced events during the observation period consisting of a maximum follow-up of 10 years and median follow up 78 months. The random survival forest (RSF) achieved the highest predictive accuracy, with a C-index of 0.84. The resultant biomarker subtypes demonstrate that patients with high PD-L1 expression combined with low CD3 counts experience higher risk of death within five-years of surgical resection.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords