Machine Learning for Long COVID Inference Based on the Acute Phase: A Case Study in Healthcare Professionals

Caio B. S. Maior; Sandrely P. Silva; Isis D. Lins; Ana Lisa Gomes; Marcio C. Moura

doi:10.1109/ACCESS.2025.3553783

IEEE Access (Jan 2025)

Machine Learning for Long COVID Inference Based on the Acute Phase: A Case Study in Healthcare Professionals

Caio B. S. Maior,
Sandrely P. Silva,
Isis D. Lins,
Ana Lisa Gomes,
Marcio C. Moura

Affiliations

Caio B. S. Maior: ORCiD; CEERMA-Center for Risk Analysis, Reliability Engineering, and Environmental Modeling, Universidade Federal de Pernambuco, Recife, Brazil
Sandrely P. Silva: ORCiD; CEERMA-Center for Risk Analysis, Reliability Engineering, and Environmental Modeling, Universidade Federal de Pernambuco, Recife, Brazil
Isis D. Lins: ORCiD; CEERMA-Center for Risk Analysis, Reliability Engineering, and Environmental Modeling, Universidade Federal de Pernambuco, Recife, Brazil
Ana Lisa Gomes: ORCiD; Department of Nursing, Vitória Academic Center, Federal University of Pernambuco, Recife, Brazil
Marcio C. Moura: ORCiD; CEERMA-Center for Risk Analysis, Reliability Engineering, and Environmental Modeling, Universidade Federal de Pernambuco, Recife, Brazil

DOI: https://doi.org/10.1109/ACCESS.2025.3553783
Journal volume & issue: Vol. 13
pp. 54019 – 54027

Abstract

Read online

Since 2021, the COVID-19 pandemic has affected global health, economies, and societal structures, leading to great attention to new research and awareness. In this context, Long COVID refers to the persistent symptoms some individuals experience following acute infection with the SARS-CoV-2 virus. Early diagnosis is crucial for managing these symptoms and possible consequences. In this study, we explore several Machine Learning (ML) techniques to estimate the development of long COVID from a serological study with 53 healthcare professionals, including IgA and IgG antibody data conducted before the availability of a COVID-19 vaccine. Four cases were analyzed by combining information on specific symptoms (e.g., fever, pain, fatigue, and/or loss of smell and paladar), comorbidity, and possibly antibody information. In addition to five ML (i.e., models such asRandom Forest, K-Nearest Neighbors, Logistic Regression, Support Vector Machine, and Multilayer Perceptron), we applied dimensionality reduction techniques such as Principal Components Analysis, Linear Discriminant Analysis, and Feature Selection. The feature selection procedure based on specific thresholds (0.1 and 0.2) of the Gini index was the most suitable dimension reduction method, with KNN highlighting the best-balanced accuracy (greater than 74%) for Case 1, when no antibody is used, and SVM when only IgG is required. For Cases 3 and 4 that require IgA, RF performance also goes up 82% with the same thresholds in these smaller datasets. These findings emphasize the potential of ML as a decision-support tool in inferring long COVID symptoms developed weeks after an acute infection.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords