Health Science Reports (Jul 2024)

The most important variables associated with death due to COVID‐19 disease, based on three data mining models Decision Tree, AdaBoost, and Support Vector Machine: A cross‐sectional study

  • Bita Shokri gharehhasani,
  • Mansour Rezaei,
  • Armin Naghipour,
  • Nazanine Sayad,
  • Shayan Mostafaei,
  • Ehsan Alimohammadi

DOI
https://doi.org/10.1002/hsr2.2266
Journal volume & issue
Vol. 7, no. 7
pp. n/a – n/a

Abstract

Read online

Abstract Introduction Death due to covid‐19 is one of the biggest health challenges in the world. There are many models that can predict death due to COVID‐19. This study aimed to fit and compare Decision Tree (DT), Support Vector Machine (SVM), and AdaBoost models to predict death due to COVID‐19. Methods To describe the variables, mean (SD) and frequency (%) were reported. To determine the relationship between the variables and the death caused by COVID‐19, chi‐square test was performed with a significance level of 0.05. To compare DT, SVM and AdaBoost models for predicting death due to COVID‐19 from sensitivity, specificity, accuracy and the area under the rock curve under R software using psych, caTools, random over‐sampling examples, rpart, rpartplot packages was done. Results Out of the total of 23,054 patients studied, 10,935 cases (46.5%) were women, and 12,569 cases (53.5%) were men. Additionally, the mean age of the patients was 54.9 ± 21.0 years. There is a statistically significant relationship between gender, fever, cough, muscle pain, smell and taste, abdominal pain, nausea and vomiting, diarrhea, anorexia, dizziness, chest pain, intubation, cancer, diabetes, chronic blood disease, Violation of immunity, pregnancy, Dialysis, chronic lung disease with the death of covid‐19 patients showed (p < 0.05). The results showed that the sensitivity, specificity, accuracy and the area under the receiver operating characteristic curve were respectively 0.60, 0.68, 0.71, and 0.75 in the DT model, 0.54, 0.62, 0.63, and 0.71 in the SVM model, and 0.59, 0.65, 0.69 and 0.74 in the AdaBoost model. Conclusion The results showed that DT had a high predictive power compared to other data mining models. Therefore, it is suggested to researchers in different fields to use DT to predict the studied variables. Also, it is suggested to use other approaches such as random forest or XGBoost to improve the accuracy in future studies.

Keywords