IEEE Access (Jan 2025)
Machine Learning for Long COVID Inference Based on the Acute Phase: A Case Study in Healthcare Professionals
Abstract
Since 2021, the COVID-19 pandemic has affected global health, economies, and societal structures, leading to great attention to new research and awareness. In this context, Long COVID refers to the persistent symptoms some individuals experience following acute infection with the SARS-CoV-2 virus. Early diagnosis is crucial for managing these symptoms and possible consequences. In this study, we explore several Machine Learning (ML) techniques to estimate the development of long COVID from a serological study with 53 healthcare professionals, including IgA and IgG antibody data conducted before the availability of a COVID-19 vaccine. Four cases were analyzed by combining information on specific symptoms (e.g., fever, pain, fatigue, and/or loss of smell and paladar), comorbidity, and possibly antibody information. In addition to five ML (i.e., models such asRandom Forest, K-Nearest Neighbors, Logistic Regression, Support Vector Machine, and Multilayer Perceptron), we applied dimensionality reduction techniques such as Principal Components Analysis, Linear Discriminant Analysis, and Feature Selection. The feature selection procedure based on specific thresholds (0.1 and 0.2) of the Gini index was the most suitable dimension reduction method, with KNN highlighting the best-balanced accuracy (greater than 74%) for Case 1, when no antibody is used, and SVM when only IgG is required. For Cases 3 and 4 that require IgA, RF performance also goes up 82% with the same thresholds in these smaller datasets. These findings emphasize the potential of ML as a decision-support tool in inferring long COVID symptoms developed weeks after an acute infection.
Keywords