BMC Medical Informatics and Decision Making (Jul 2022)
Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran
Abstract
Abstract Background Due to the high mortality of COVID-19 patients, the use of a high-precision classification model of patient’s mortality that is also interpretable, could help reduce mortality and take appropriate action urgently. In this study, the random forest method was used to select the effective features in COVID-19 mortality and the classification was performed using logistic model tree (LMT), classification and regression tree (CART), C4.5, and C5.0 tree based on important features. Methods In this retrospective study, the data of 2470 COVID-19 patients admitted to hospitals in Hamadan, west Iran, were used, of which 75.02% recovered and 24.98% died. To classify, at first among the 25 demographic, clinical, and laboratory findings, features with a relative importance more than 6% were selected by random forest. Then LMT, C4.5, C5.0, and CART trees were developed and the accuracy of classification performance was evaluated with recall, accuracy, and F1-score criteria for training, test, and total datasets. At last, the best tree was developed and the receiver operating characteristic curve and area under the curve (AUC) value were reported. Results The results of this study showed that among demographic and clinical features gender and age, and among laboratory findings blood urea nitrogen, partial thromboplastin time, serum glutamic-oxaloacetic transaminase, and erythrocyte sedimentation rate had more than 6% relative importance. Developing the trees using the above features revealed that the CART with the values of F1-score, Accuracy, and Recall, 0.8681, 0.7824, and 0.955, respectively, for the test dataset and 0.8667, 0.7834, and 0.9385, respectively, for the total dataset had the best performance. The AUC value obtained for the CART was 79.5%. Conclusions Finding a highly accurate and qualified model for interpreting the classification of a response that is considered clinically consequential is critical at all stages, including treatment and immediate decision making. In this study, the CART with its high accuracy for diagnosing and classifying mortality of COVID-19 patients as well as prioritizing important demographic, clinical, and laboratory findings in an interpretable format, risk factors for prognosis of COVID-19 patients mortality identify and enable immediate and appropriate decisions for health professionals and physicians.
Keywords