Fast COVID-19 Infection Prediction with In-House Data Using Machine Learning Classification Algorithms: A Case Study of Iran

Ali Shabrandi; Ali Rajabzadeh Ghatari; Nader Tavakoli; Mohammad Dehghan Nayeri; Sahar Mirzaei

doi:10.22044/jadm.2023.13291.2458

Journal of Artificial Intelligence and Data Mining (Nov 2023)

Fast COVID-19 Infection Prediction with In-House Data Using Machine Learning Classification Algorithms: A Case Study of Iran

Ali Shabrandi,
Ali Rajabzadeh Ghatari,
Nader Tavakoli,
Mohammad Dehghan Nayeri,
Sahar Mirzaei

Affiliations

Ali Shabrandi: Department of Industrial Management, Faculty of Management and Economics, Tarbiat Modares University, Tehran, Iran.
Ali Rajabzadeh Ghatari: Department of Industrial Management, Faculty of Management and Economics, Tarbiat Modares University, Tehran, Iran.
Nader Tavakoli: Department of Emergency Medicine, Trauma and Injury Research Center, Iran University of Medical Sciences, Tehran, Iran.
Mohammad Dehghan Nayeri: Department of Industrial Management, Faculty of Management and Economics, Tarbiat Modares University, Tehran, Iran.
Sahar Mirzaei: Department of Health and Environment, Iran University of Medical Sciences, Tehran, Iran.

DOI: https://doi.org/10.22044/jadm.2023.13291.2458
Journal volume & issue: Vol. 11, no. 4
pp. 573 – 585

Abstract

Read online

To mitigate COVID-19’s overwhelming burden, a rapid and efficient early screening scheme for COVID-19 in the first-line is required. Much research has utilized laboratory tests, CT scans, and X-ray data, which are obstacles to agile and real-time screening. In this study, we propose a user-friendly and low-cost COVID-19 detection model based on self-reportable data at home. The most exhausted input features were identified and included in the demographic, symptoms, semi-clinical, and past/present disease data categories. We employed Grid search to identify the optimal combination of hyperparameter settings that yields the most accurate prediction. Next, we apply the proposed model with tuned hyperparameters to 11 classic state-of-the-art classifiers. The results show that the XGBoost classifier provides the highest accuracy of 73.3%, but statistical analysis shows that there is no significant difference between the accuracy performance of XGBoost and AdaBoost, although it proved the superiority of these two methods over other methods. Furthermore, the most important features obtained using SHapely Adaptive explanations were analyzed. “Contact with infected people,” “cough,” “muscle pain,” “fever,” “age,” “Cardiovascular commodities,” “PO2,” and “respiratory distress” are the most important variables. Among these variables, the first three have a relatively large positive impact on the target variable. Whereas, “age,” “PO2”, and “respiratory distress” are highly negatively correlated with the target variable. Finally, we built a clinically operable, visible, and easy-to-interpret decision tree model to predict COVID-19 infection.

Published in Journal of Artificial Intelligence and Data Mining

ISSN: 2322-5211 (Print); 2322-4444 (Online)
Publisher: Shahrood University of Technology
Country of publisher: Iran, Islamic Republic of
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: http://jad.shahroodut.ac.ir/

About the journal

Abstract

Keywords