Comparing Performances of Predictive Models of Toxicity after Radiotherapy for Breast Cancer Using Different Machine Learning Approaches

Maria Giulia Ubeira-Gabellini; Martina Mori; Gabriele Palazzo; Alessandro Cicchetti; Paola Mangili; Maddalena Pavarini; Tiziana Rancati; Andrei Fodor; Antonella del Vecchio; Nadia Gisella Di Muzio; Claudio Fiorino

doi:10.3390/cancers16050934

Cancers (Feb 2024)

Comparing Performances of Predictive Models of Toxicity after Radiotherapy for Breast Cancer Using Different Machine Learning Approaches

Maria Giulia Ubeira-Gabellini,
Martina Mori,
Gabriele Palazzo,
Alessandro Cicchetti,
Paola Mangili,
Maddalena Pavarini,
Tiziana Rancati,
Andrei Fodor,
Antonella del Vecchio,
Nadia Gisella Di Muzio,
Claudio Fiorino

Affiliations

Maria Giulia Ubeira-Gabellini: Medical Physics, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
Martina Mori: Medical Physics, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
Gabriele Palazzo: Medical Physics, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
Alessandro Cicchetti: Data Science Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, 20133 Milan, Italy
Paola Mangili: Medical Physics, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
Maddalena Pavarini: Medical Physics, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
Tiziana Rancati: Data Science Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, 20133 Milan, Italy
Andrei Fodor: Radiotherapy, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
Antonella del Vecchio: Medical Physics, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
Nadia Gisella Di Muzio: Radiotherapy, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
Claudio Fiorino: Medical Physics, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy

DOI: https://doi.org/10.3390/cancers16050934
Journal volume & issue: Vol. 16, no. 5
p. 934

Abstract

Read online

Purpose. Different ML models were compared to predict toxicity in RT on a large cohort (n = 1314). Methods. The endpoint was RTOG G2/G3 acute toxicity, resulting in 204/1314 patients with the event. The dataset, including 25 clinical, anatomical, and dosimetric features, was split into 984 for training and 330 for internal tests. The dataset was standardized; features with a high p-value at univariate LR and with Spearman ρ>0.8 were excluded; synthesized data of the minority were generated to compensate for class imbalance. Twelve ML methods were considered. Model optimization and sequential backward selection were run to choose the best models with a parsimonious feature number. Finally, feature importance was derived for every model. Results. The model’s performance was compared on a training–test dataset over different metrics: the best performance model was LightGBM. Logistic regression with three variables (LR3) selected via bootstrapping showed performances similar to the best-performing models. The AUC of test data is slightly above 0.65 for the best models (highest value: 0.662 with LightGBM). Conclusions. No model performed the best for all metrics: more complex ML models had better performances; however, models with just three features showed performances comparable to the best models using many (n = 13–19) features.

Published in Cancers

ISSN: 2072-6694 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://www.mdpi.com/journal/cancers/

About the journal

Abstract

Keywords