陆军军医大学学报 (Oct 2024)

Construction of postoperative prognostic model for primary liver cancer based on SMOTE and machine learning

  • PAN Bi,
  • YU Jinghu,
  • HUANG Yixian

DOI
https://doi.org/10.16016/j.2097-0927.202310052
Journal volume & issue
Vol. 46, no. 19
pp. 2236 – 2240

Abstract

Read online

Objective To construct a prognosis prediction model of primary liver cancer after surgical treatment based on synthetic minority over-sampling technique(SMOTE) algorithm and machine learning model. Methods A retrospective cohort study was conducted on 4 297 patients with primary liver cancer from the surveillance, epidemiology, and end results(SEER) database. One-Hot Encoding and Multiple Imputation were used to preprocess the collect data, and SMOTE algorithm was employed to solve the imbalance of data categories. The obtained clinical variables were included in the machine learning model. Based on decision tree(DT), random forest(RF), gradient boosting decision tree(GBDT) and eXtreme Gradient Boosting(XGBoost), a prognostic prediction model(SMOTE+DT/RF/GBDT/XGBoost) was build, and then the best prediction model was determined by comparing the performance of various models. Finally, a prognostic analysis system for primary liver cancer was developed based on the optimal model, which was then visualized. Results The combination model SMOTE+RF showed the best predictive performance, with higher area under the curve(0.895), accuracy(0.811) and precision(0.806) than those of other models in receiver operating characteristic curve(ROC) analysis. Conclusion The SMOTE+RF prognostic prediction model can effectively predict the survival outcome of patients with primary liver cancer.

Keywords