Journal of Statistical Theory and Applications (JSTA) (Sep 2023)

An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting

  • Aditya Chakraborty,
  • Chris P. Tsokos

DOI
https://doi.org/10.1007/s44199-023-00063-7
Journal volume & issue
Vol. 22, no. 4
pp. 262 – 282

Abstract

Read online

Abstract Pancreatic cancer is one of the deadliest carcinogenic diseases affecting people all over the world. The majority of patients are usually detected at Stage III or Stage IV, and the chances of survival are very low once detected at the late stages. This study focuses on building an efficient data-driven analytical predictive model based on the associated risk factors and identifying the most contributing factors influencing the survival times of patients diagnosed with pancreatic cancer using the XGBoost (eXtreme Gradient Boosting) algorithm. The grid-search mechanism was implemented to compute the optimum values of the hyper-parameters of the analytical model by minimizing the root mean square error (RMSE). The optimum hyperparameters of the final analytical model were selected by comparing the values with 243 competing models. To check the validity of the model, we compared the model’s performance with ten deep neural network models, grown sequentially with different activation functions and optimizers. We also constructed an ensemble model using Gradient Boosting Machine (GBM). The proposed XGBoost model outperformed all competing models we considered with regard to root mean square error (RMSE). After developing the model, the individual risk factors were ranked according to their individual contribution to the response predictions, which is extremely important for pancreatic research organizations to spend their resources on the risk factors causing/influencing the particular type of cancer. The three most influencing risk factors affecting the survival of pancreatic cancer patients were found to be the age of the patient, current BMI, and cigarette smoking years with contributing percentages of 35.5%, 24.3%, and 14.93%, respectively. The predictive model is approximately 96.42% accurate in predicting the survival times of the patients diagnosed with pancreatic cancer and performs excellently on test data. The analytical methodology of developing the model can be utilized for prediction purposes. It can be utilized to predict the time to death related to a specific type of cancer, given a set of numeric, and non-numeric features.

Keywords