Applying probability calibration to ensemble methods to predict 2-year mortality in patients with DLBCL

Shuanglong Fan; Zhiqiang Zhao; Hongmei Yu; Lei Wang; Chuchu Zheng; Xueqian Huang; Zhenhuan Yang; Meng Xing; Qing Lu; Yanhong Luo

doi:10.1186/s12911-020-01354-0

BMC Medical Informatics and Decision Making (Jan 2021)

Applying probability calibration to ensemble methods to predict 2-year mortality in patients with DLBCL

Shuanglong Fan,
Zhiqiang Zhao,
Hongmei Yu,
Lei Wang,
Chuchu Zheng,
Xueqian Huang,
Zhenhuan Yang,
Meng Xing,
Qing Lu,
Yanhong Luo

Affiliations

Shuanglong Fan: Department of Health Statistics, School of Public Health, Shanxi Medical University
Zhiqiang Zhao: Department of Hematology, Shanxi Cancer Hospital
Hongmei Yu: Department of Health Statistics, School of Public Health, Shanxi Medical University
Lei Wang: Department of Health Statistics, School of Public Health, Shanxi Medical University
Chuchu Zheng: Department of Health Statistics, School of Public Health, Shanxi Medical University
Xueqian Huang: Department of Health Statistics, School of Public Health, Shanxi Medical University
Zhenhuan Yang: Department of Health Statistics, School of Public Health, Shanxi Medical University
Meng Xing: Department of Health Statistics, School of Public Health, Shanxi Medical University
Qing Lu: Department of Epidemiology and Biostatistics, Michigan State University
Yanhong Luo: Department of Health Statistics, School of Public Health, Shanxi Medical University

DOI: https://doi.org/10.1186/s12911-020-01354-0
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Under the influences of chemotherapy regimens, clinical staging, immunologic expressions and other factors, the survival rates of patients with diffuse large B-cell lymphoma (DLBCL) are different. The accurate prediction of mortality hazards is key to precision medicine, which can help clinicians make optimal therapeutic decisions to extend the survival times of individual patients with DLBCL. Thus, we have developed a predictive model to predict the mortality hazard of DLBCL patients within 2 years of treatment. Methods We evaluated 406 patients with DLBCL and collected 17 variables from each patient. The predictive variables were selected by the Cox model, the logistic model and the random forest algorithm. Five classifiers were chosen as the base models for ensemble learning: the naïve Bayes, logistic regression, random forest, support vector machine and feedforward neural network models. We first calibrated the biased outputs from the five base models by using probability calibration methods (including shape-restricted polynomial regression, Platt scaling and isotonic regression). Then, we aggregated the outputs from the various base models to predict the 2-year mortality of DLBCL patients by using three strategies (stacking, simple averaging and weighted averaging). Finally, we assessed model performance over 300 hold-out tests. Results Gender, stage, IPI, KPS and rituximab were significant factors for predicting the deaths of DLBCL patients within 2 years of treatment. The stacking model that first calibrated the base model by shape-restricted polynomial regression performed best (AUC = 0.820, ECE = 8.983, MCE = 21.265) in all methods. In contrast, the performance of the stacking model without undergoing probability calibration is inferior (AUC = 0.806, ECE = 9.866, MCE = 24.850). In the simple averaging model and weighted averaging model, the prediction error of the ensemble model also decreased with probability calibration. Conclusions Among all the methods compared, the proposed model has the lowest prediction error when predicting the 2-year mortality of DLBCL patients. These promising results may indicate that our modeling strategy of applying probability calibration to ensemble learning is successful.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords