Journal of Inflammation Research (Feb 2025)

Identifying and Validating Prognostic Hyper-Inflammatory and Hypo-Inflammatory COVID-19 Clinical Phenotypes Using Machine Learning Methods

  • Ji X,
  • Guo Y,
  • Tang L,
  • Gao C

Journal volume & issue
Vol. Volume 18
pp. 3009 – 3024

Abstract

Read online

Xiaojing Ji, Yiran Guo, Lujia Tang, Chengjin Gao Department of Emergency, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, People’s Republic of ChinaCorrespondence: Lujia Tang; Chengjin Gao, Email [email protected]; [email protected]: COVID-19 exhibits complex pathophysiological manifestations, characterized by significant clinical and biological heterogeneity. Identifying phenotypes may enhance our understanding of the disease’s diverse trajectories, benefiting clinical practice and trials.Methods: This study included adult patients with COVID-19 from Xinhua Hospital, affiliated with Shanghai Jiao Tong University School of Medicine, between December 15, 2022, and February 15, 2023. The k-prototypes clustering method was employed using 50 clinical variables to identify phenotypes. Machine learning algorithms were then applied to select key classifier variables for phenotype recognition.Results: A total of 1376 patients met the inclusion criteria. K-prototypes clustering revealed two distinct subphenotypes: Hypo-inflammatory subphenotype (824 [59.9%]) and Hyper-inflammatory subphenotype (552 [40.1%]). Patients in Hypo-inflammatory subphenotype were younger, predominantly female, with low mortality and shorter hospital stays. In contrast, Hyper-inflammatory subphenotype patients were older, predominantly male, exhibiting a hyperinflammatory state with higher mortality and rates of organ dysfunction. The AdaBoost model performed best for subphenotype prediction (Accuracy: 0.975, Precision: 0.968, Recall: 0.976, F1: 0.972, AUROC: 0.975). “CRP”, “IL-2R”, “D-dimer”, “ST2”, “BUN”, “NT-proBNP”, “neutrophil percentage”, and “lymphocyte count” were identified as the top-ranked variables in the AdaBoost model.Conclusion: This analysis identified two phenotypes based on COVID-19 symptoms and comorbidities. These phenotypes can be accurately recognized using machine learning models, with the AdaBoost model being optimal for predicting in-hospital mortality. The variables “CRP”, “IL-2R”, “D-dimer”, “ST2”, “BUN”, “NT-proBNP”, “neutrophil percentage”, and “lymphocyte count” play a significant role in the prediction of subphenotypes. Use the identified subphenotypes for risk stratification in clinical practice. Hyper-inflammatory subphenotypes can be closely monitored, and preventive measures such as early admission to the intensive care unit or prophylactic anticoagulation can be taken.Keywords: COVID-19, subphenotypes, K-prototypes clustering, machine learning, mortality prediction

Keywords