Frontiers in Oncology (Oct 2024)
Creating an interactive database for nasopharyngeal carcinoma management: applying machine learning to evaluate metastasis and survival
Abstract
ObjectiveNasopharyngeal carcinoma (NPC) patients frequently present with distant metastasis (DM), which is typically associated with poor prognosis. This study aims to develop and apply machine learning models to predict DM, overall survival (OS), and cancer-specific survival (CSS) in NPC patients to provide optimal tools for improved predictive accuracy and performance.MethodsWe retrieved over 8,000 NPC patient samples with associated clinical information from the Surveillance, Epidemiology, and End Results (SEER) database. Utilizing two methods for handling missing values—imputation or deletion—we created various cohorts: DM-all, DM-slim, OS-all, OS-slim, CSS-all, and CSS-slim. Five machine learning models were deployed for the binary classification task of DM, and their performance was evaluated using the area under the curve (AUC). For the survival prediction tasks of OS and CSS, we constructed 45 combinations using nine survival machine learning algorithms. The Concordance Index (C-index), 5-year AUC, and Brier score assessed model accuracy. Patients were stratified into two risk groups for survival analysis, and the survival curves were presented.ResultsThis study examines the relationships between clinical factors and survival in NPC patients. The analysis, visualized through forest plots, indicates that demographic and clinical variables like gender, marital status, tumor grade, and stage significantly affect metastatic risks and survival. Specifically, factors such as advanced stages increase metastasis and survival risks, while enhanced treatments improve survival rates. In the cohort for DM prediction, results revealed that the random forest model was the most effective, with an AUC of 0.687. In contrast, when predicting overall survival (OS), the random survival forest (RSF) model consistently showed superior performance with the highest mean C-index of 0.802, a 5-year AUC of 0.857, and a Brier score of 0.167. Similarly, for cancer-specific survival (CSS) prediction, the RSF model demonstrated a mean C-index of 0.822, a 5-year AUC of 0.884, and a Brier score of 0.165. An online Shiny server was developed to allow the models to be used freely and efficiently via http://npcml.shinyapps.io/NPCpre.ConclusionThis study successfully established an online tool by machine learning models for NPC metastasis and survival prediction, providing valuable references for clinicians.
Keywords