Probability calibration-based prediction of recurrence rate in patients with diffuse large B-cell lymphoma

Shuanglong Fan; Zhiqiang Zhao; Yanbo Zhang; Hongmei Yu; Chuchu Zheng; Xueqian Huang; Zhenhuan Yang; Meng Xing; Qing Lu; Yanhong Luo

doi:10.1186/s13040-021-00272-9

BioData Mining (Aug 2021)

Probability calibration-based prediction of recurrence rate in patients with diffuse large B-cell lymphoma

Shuanglong Fan,
Zhiqiang Zhao,
Yanbo Zhang,
Hongmei Yu,
Chuchu Zheng,
Xueqian Huang,
Zhenhuan Yang,
Meng Xing,
Qing Lu,
Yanhong Luo

Affiliations

Shuanglong Fan: Department of Health Statistics, School of Public Health, Shanxi Medical University
Zhiqiang Zhao: Department of Hematology, Shanxi Cancer Hospital
Yanbo Zhang: Department of Health Statistics, School of Public Health, Shanxi Medical University
Hongmei Yu: Department of Health Statistics, School of Public Health, Shanxi Medical University
Chuchu Zheng: Department of Health Statistics, School of Public Health, Shanxi Medical University
Xueqian Huang: Department of Health Statistics, School of Public Health, Shanxi Medical University
Zhenhuan Yang: Department of Health Statistics, School of Public Health, Shanxi Medical University
Meng Xing: Department of Health Statistics, School of Public Health, Shanxi Medical University
Qing Lu: Department of Epidemiology and Biostatistics, Michigan State University
Yanhong Luo: Department of Health Statistics, School of Public Health, Shanxi Medical University

DOI: https://doi.org/10.1186/s13040-021-00272-9
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Background Although many patients receive good prognoses with standard therapy, 30–50% of diffuse large B-cell lymphoma (DLBCL) cases may relapse after treatment. Statistical or computational intelligent models are powerful tools for assessing prognoses; however, many cannot generate accurate risk (probability) estimates. Thus, probability calibration-based versions of traditional machine learning algorithms are developed in this paper to predict the risk of relapse in patients with DLBCL. Methods Five machine learning algorithms were assessed, namely, naïve Bayes (NB), logistic regression (LR), random forest (RF), support vector machine (SVM) and feedforward neural network (FFNN), and three methods were used to develop probability calibration-based versions of each of the above algorithms, namely, Platt scaling (Platt), isotonic regression (IsoReg) and shape-restricted polynomial regression (RPR). Performance comparisons were based on the average results of the stratified hold-out test, which was repeated 500 times. We used the AUC to evaluate the discrimination ability (i.e., classification ability) of the model and assessed the model calibration (i.e., risk prediction accuracy) using the H-L goodness-of-fit test, ECE, MCE and BS. Results Sex, stage, IPI, KPS, GCB, CD10 and rituximab were significant factors predicting the 3-year recurrence rate of patients with DLBCL. For the 5 uncalibrated algorithms, the LR (ECE = 8.517, MCE = 20.100, BS = 0.188) and FFNN (ECE = 8.238, MCE = 20.150, BS = 0.184) models were well-calibrated. The errors of the initial risk estimate of the NB (ECE = 15.711, MCE = 34.350, BS = 0.212), RF (ECE = 12.740, MCE = 27.200, BS = 0.201) and SVM (ECE = 9.872, MCE = 23.800, BS = 0.194) models were large. With probability calibration, the biased NB, RF and SVM models were well-corrected. The calibration errors of the LR and FFNN models were not further improved regardless of the probability calibration method. Among the 3 calibration methods, RPR achieved the best calibration for both the RF and SVM models. The power of IsoReg was not obvious for the NB, RF or SVM models. Conclusions Although these algorithms all have good classification ability, several cannot generate accurate risk estimates. Probability calibration is an effective method of improving the accuracy of these poorly calibrated algorithms. Our risk model of DLBCL demonstrates good discrimination and calibration ability and has the potential to help clinicians make optimal therapeutic decisions to achieve precision medicine.

Published in BioData Mining

ISSN: 1756-0381 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Analysis
Website: https://biodatamining.biomedcentral.com/

About the journal

Abstract

Keywords