Journal of Inflammation Research (Feb 2025)
Identifying and Validating Prognostic Hyper-Inflammatory and Hypo-Inflammatory COVID-19 Clinical Phenotypes Using Machine Learning Methods
Abstract
Xiaojing Ji, Yiran Guo, Lujia Tang, Chengjin Gao Department of Emergency, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, People’s Republic of ChinaCorrespondence: Lujia Tang; Chengjin Gao, Email [email protected]; [email protected]: COVID-19 exhibits complex pathophysiological manifestations, characterized by significant clinical and biological heterogeneity. Identifying phenotypes may enhance our understanding of the disease’s diverse trajectories, benefiting clinical practice and trials.Methods: This study included adult patients with COVID-19 from Xinhua Hospital, affiliated with Shanghai Jiao Tong University School of Medicine, between December 15, 2022, and February 15, 2023. The k-prototypes clustering method was employed using 50 clinical variables to identify phenotypes. Machine learning algorithms were then applied to select key classifier variables for phenotype recognition.Results: A total of 1376 patients met the inclusion criteria. K-prototypes clustering revealed two distinct subphenotypes: Hypo-inflammatory subphenotype (824 [59.9%]) and Hyper-inflammatory subphenotype (552 [40.1%]). Patients in Hypo-inflammatory subphenotype were younger, predominantly female, with low mortality and shorter hospital stays. In contrast, Hyper-inflammatory subphenotype patients were older, predominantly male, exhibiting a hyperinflammatory state with higher mortality and rates of organ dysfunction. The AdaBoost model performed best for subphenotype prediction (Accuracy: 0.975, Precision: 0.968, Recall: 0.976, F1: 0.972, AUROC: 0.975). “CRP”, “IL-2R”, “D-dimer”, “ST2”, “BUN”, “NT-proBNP”, “neutrophil percentage”, and “lymphocyte count” were identified as the top-ranked variables in the AdaBoost model.Conclusion: This analysis identified two phenotypes based on COVID-19 symptoms and comorbidities. These phenotypes can be accurately recognized using machine learning models, with the AdaBoost model being optimal for predicting in-hospital mortality. The variables “CRP”, “IL-2R”, “D-dimer”, “ST2”, “BUN”, “NT-proBNP”, “neutrophil percentage”, and “lymphocyte count” play a significant role in the prediction of subphenotypes. Use the identified subphenotypes for risk stratification in clinical practice. Hyper-inflammatory subphenotypes can be closely monitored, and preventive measures such as early admission to the intensive care unit or prophylactic anticoagulation can be taken.Keywords: COVID-19, subphenotypes, K-prototypes clustering, machine learning, mortality prediction