Risk Management and Healthcare Policy (Oct 2021)
Machine Learning-Based Prediction for 4-Year Risk of Metabolic Syndrome in Adults: A Retrospective Cohort Study
Abstract
Hui Zhang,1,* Dandan Chen,1,* Jing Shao,1 Ping Zou,2 Nianqi Cui,3 Leiwen Tang,1 Xiyi Wang,4 Dan Wang,1 Jingjie Wu,1 Zhihong Ye1 1Department of Nursing, Zhejiang University School of Medicine Sir Run Run Shaw Hospital, Hangzhou, Zhejiang, People’s Republic of China; 2Department of Scholar Practitioner Program, School of Nursing, Nipissing University, Toronto, Ontario, Canada; 3Department of Nursing, The Second Affiliated Hospital Zhejiang University School of Medicine, Hangzhou, Zhejiang, People’s Republic of China; 4Department of Nursing, School of Nursing, Shanghai JiaoTong University, Shanghai, People’s Republic of China*These authors contributed equally to this workCorrespondence: Zhihong YeDepartment of Nursing, Zhejiang University School of Medicine Sir Run Run Shaw Hospital, 3# Qingchun Dong Road, Jianggan District, Hangzhou, Zhejiang, People’s Republic of ChinaTel +86 13606612119Email [email protected]: Machine learning (ML) techniques have emerged as a promising tool to predict risk and make decisions in different medical domains. We aimed to compare the predictive performance of machine learning-based methods for 4-year risk of metabolic syndrome in adults with the previous model using logistic regression.Patients and Methods: This was a retrospective cohort study that employed a temporal validation strategy. Three popular ML techniques were selected to build the prognostic models. These techniques were artificial neural networks, classification and regression tree, and support vector machine. The logistic regression algorithm and ML techniques used the same five predictors. Discrimination, calibration, Brier score, and decision curve analysis were compared for model performance.Results: Discrimination was above 0.7 for all models except classification and regression tree model in internal validation, while the logistic regression model showed the highest discrimination in external validation (0.782) and the smallest discrimination differences. The logistic regression model had the best calibration performance, and ANN also showed satisfactory calibration in internal validation and external validation. For overall performance, logistic regression had the smallest Brier score differences in internal validation and external validation, and it also had the largest net benefit in external validation.Conclusion: Overall, this study indicated that the logistic regression model performed as well as the flexible ML-based prediction models at internal validation, while the logistic regression model had the best performance at external validation. For clinical use, when the performance of the logistic regression model is similar to ML-based prediction models, the simplest and more interpretable model should be chosen.Keywords: prognosis model, metabolic syndrome, calibration, discrimination, machine learning