Journal of Artificial Intelligence and Data Mining (Nov 2023)
Fast COVID-19 Infection Prediction with In-House Data Using Machine Learning Classification Algorithms: A Case Study of Iran
Abstract
To mitigate COVID-19’s overwhelming burden, a rapid and efficient early screening scheme for COVID-19 in the first-line is required. Much research has utilized laboratory tests, CT scans, and X-ray data, which are obstacles to agile and real-time screening. In this study, we propose a user-friendly and low-cost COVID-19 detection model based on self-reportable data at home. The most exhausted input features were identified and included in the demographic, symptoms, semi-clinical, and past/present disease data categories. We employed Grid search to identify the optimal combination of hyperparameter settings that yields the most accurate prediction. Next, we apply the proposed model with tuned hyperparameters to 11 classic state-of-the-art classifiers. The results show that the XGBoost classifier provides the highest accuracy of 73.3%, but statistical analysis shows that there is no significant difference between the accuracy performance of XGBoost and AdaBoost, although it proved the superiority of these two methods over other methods. Furthermore, the most important features obtained using SHapely Adaptive explanations were analyzed. “Contact with infected people,” “cough,” “muscle pain,” “fever,” “age,” “Cardiovascular commodities,” “PO2,” and “respiratory distress” are the most important variables. Among these variables, the first three have a relatively large positive impact on the target variable. Whereas, “age,” “PO2”, and “respiratory distress” are highly negatively correlated with the target variable. Finally, we built a clinically operable, visible, and easy-to-interpret decision tree model to predict COVID-19 infection.
Keywords