Research and Practice in Thrombosis and Haemostasis (Feb 2020)
Machine learning to predict venous thrombosis in acutely ill medical patients
Abstract
Abstract Background The identification of acutely ill patients at high risk for venous thromboembolism (VTE) may be determined clinically or by use of integer‐based scoring systems. These scores demonstrated modest performance in external data sets. Objectives To evaluate the performance of machine learning models compared to the IMPROVE score. Methods The APEX trial randomized 7513 acutely medically ill patients to extended duration betrixaban vs. enoxaparin. Including 68 variables, a super learner model (ML) was built to predict VTE by combining estimates from 5 families of candidate models. A “reduced” model (rML) was also developed using 16 variables that were thought, a priori, to be associated with VTE. The IMPROVE score was calculated for each patient. Model performance was assessed by discrimination and calibration to predict a composite VTE end point. The frequency of predicted risks of VTE were plotted and divided into tertiles. VTE risks were compared across tertiles. Results The ML and rML algorithms outperformed the IMPROVE score in predicting VTE (c‐statistic: 0.69, 0.68 and 0.59, respectively). The Hosmer‐Lemeshow goodness‐of‐fit P‐value was 0.06 for ML, 0.44 for rML, and <0.001 for the IMPROVE score. The observed event rate in the lowest tertile was 2.5%, 4.8% in tertile 2, and 11.4% in the highest tertile. Patients in the highest tertile of VTE risk had a 5‐fold increase in odds of VTE compared to the lowest tertile. Conclusion The super learner algorithms improved discrimination and calibration compared to the IMPROVE score for predicting VTE in acute medically ill patients.
Keywords