BMC Medical Informatics and Decision Making (Jun 2024)
A tree-based explainable AI model for early detection of Covid-19 using physiological data
Abstract
Abstract With the outbreak of COVID-19 in 2020, countries worldwide faced significant concerns and challenges. Various studies have emerged utilizing Artificial Intelligence (AI) and Data Science techniques for disease detection. Although COVID-19 cases have declined, there are still cases and deaths around the world. Therefore, early detection of COVID-19 before the onset of symptoms has become crucial in reducing its extensive impact. Fortunately, wearable devices such as smartwatches have proven to be valuable sources of physiological data, including Heart Rate (HR) and sleep quality, enabling the detection of inflammatory diseases. In this study, we utilize an already-existing dataset that includes individual step counts and heart rate data to predict the probability of COVID-19 infection before the onset of symptoms. We train three main model architectures: the Gradient Boosting classifier (GB), CatBoost trees, and TabNet classifier to analyze the physiological data and compare their respective performances. We also add an interpretability layer to our best-performing model, which clarifies prediction results and allows a detailed assessment of effectiveness. Moreover, we created a private dataset by gathering physiological data from Fitbit devices to guarantee reliability and avoid bias. The identical set of models was then applied to this private dataset using the same pre-trained models, and the results were documented. Using the CatBoost tree-based method, our best-performing model outperformed previous studies with an accuracy rate of 85% on the publicly available dataset. Furthermore, this identical pre-trained CatBoost model produced an accuracy of 81% when applied to the private dataset. You will find the source code in the link: https://github.com/OpenUAE-LAB/Covid-19-detection-using-Wearable-data.git .
Keywords