PLoS ONE (Jan 2024)

Prognosing post-treatment outcomes of head and neck cancer using structured data and machine learning: A systematic review.

  • Mohammad Moharrami,
  • Parnia Azimian Zavareh,
  • Erin Watson,
  • Sonica Singhal,
  • Alistair E W Johnson,
  • Ali Hosni,
  • Carlos Quinonez,
  • Michael Glogauer

DOI
https://doi.org/10.1371/journal.pone.0307531
Journal volume & issue
Vol. 19, no. 7
p. e0307531

Abstract

Read online

BackgroundThis systematic review aimed to evaluate the performance of machine learning (ML) models in predicting post-treatment survival and disease progression outcomes, including recurrence and metastasis, in head and neck cancer (HNC) using clinicopathological structured data.MethodsA systematic search was conducted across the Medline, Scopus, Embase, Web of Science, and Google Scholar databases. The methodological characteristics and performance metrics of studies that developed and validated ML models were assessed. The risk of bias was evaluated using the Prediction model Risk Of Bias ASsessment Tool (PROBAST).ResultsOut of 5,560 unique records, 34 articles were included. For survival outcome, the ML model outperformed the Cox proportional hazards model in time-to-event analyses for HNC, with a concordance index of 0.70-0.79 vs. 0.66-0.76, and for all sub-sites including oral cavity (0.73-0.89 vs. 0.69-0.77) and larynx (0.71-0.85 vs. 0.57-0.74). In binary classification analysis, the area under the receiver operating characteristics (AUROC) of ML models ranged from 0.75-0.97, with an F1-score of 0.65-0.89 for HNC; AUROC of 0.61-0.91 and F1-score of 0.58-0.86 for the oral cavity; and AUROC of 0.76-0.97 and F1-score of 0.63-0.92 for the larynx. Disease-specific survival outcomes showed higher performance than overall survival outcomes, but the performance of ML models did not differ between three- and five-year follow-up durations. For disease progression outcomes, no time-to-event metrics were reported for ML models. For binary classification of the oral cavity, the only evaluated subsite, the AUROC ranged from 0.67 to 0.97, with F1-scores between 0.53 and 0.89.ConclusionsML models have demonstrated considerable potential in predicting post-treatment survival and disease progression, consistently outperforming traditional linear models and their derived nomograms. Future research should incorporate more comprehensive treatment features, emphasize disease progression outcomes, and establish model generalizability through external validations and the use of multicenter datasets.