Admission blood tests predicting survival of SARS-CoV-2 infected patients: a practical implementation of graph convolution network in imbalance dataset

Jie Lian; Fan Huang; Xinhai Huang; Kitty Yu-Yeung  Lau; Kei Shing Ng; Carlin Chun Fai Chu; Simon Ching Lam; Mohamad Koohli-Moghadam; Varut Vardhanabhuti

doi:10.1186/s12879-024-09699-x

BMC Infectious Diseases (Aug 2024)

Admission blood tests predicting survival of SARS-CoV-2 infected patients: a practical implementation of graph convolution network in imbalance dataset

Jie Lian,
Fan Huang,
Xinhai Huang,
Kitty Yu-Yeung Lau,
Kei Shing Ng,
Carlin Chun Fai Chu,
Simon Ching Lam,
Mohamad Koohli-Moghadam,
Varut Vardhanabhuti

Affiliations

Jie Lian: Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong
Fan Huang: Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong
Xinhai Huang: Faculty of Science, The University of Hong Kong
Kitty Yu-Yeung Lau: WHO Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong
Kei Shing Ng: Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong
Carlin Chun Fai Chu: Department of Computing, The Hang Seng University of Hong Kong
Simon Ching Lam: School of Nursing, Tung Wah College
Mohamad Koohli-Moghadam: Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong
Varut Vardhanabhuti: Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong

DOI: https://doi.org/10.1186/s12879-024-09699-x
Journal volume & issue: Vol. 24, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Predicting an individual’s risk of death from COVID-19 is essential for planning and optimising resources. However, since the real-world mortality rate is relatively low, particularly in places like Hong Kong, this makes building an accurate prediction model difficult due to the imbalanced nature of the dataset. This study introduces an innovative application of graph convolutional networks (GCNs) to predict COVID-19 patient survival using a highly imbalanced dataset. Unlike traditional models, GCNs leverage structural relationships within the data, enhancing predictive accuracy and robustness. By integrating demographic and laboratory data into a GCN framework, our approach addresses class imbalance and demonstrates significant improvements in prediction accuracy. Methods The cohort included all consecutive positive COVID-19 patients fulfilling study criteria admitted to 42 public hospitals in Hong Kong between January 23 and December 31, 2020 (n = 7,606). We proposed the population-based graph convolutional neural network (GCN) model which took blood test results, age and sex as inputs to predict the survival outcomes. Furthermore, we compared our proposed model to the Cox Proportional Hazard (CPH) model, conventional machine learning models, and oversampling machine learning models. Additionally, a subgroup analysis was performed on the test set in order to acquire a deeper understanding of the relationship between each patient node and its neighbours, revealing possible underlying causes of the inaccurate predictions. Results The GCN model was the top-performing model, with an AUC of 0.944, considerably outperforming all other models (p < 0.05), including the oversampled CPH model (0.708), linear regression (0.877), Linear Discriminant Analysis (0.860), K-nearest neighbours (0.834), Gaussian predictor (0.745) and support vector machine (0.847). With Kaplan-Meier estimates, the GCN model demonstrated good discriminability between low- and high-risk individuals (p < 0.0001). Based on subanalysis using the weighted-in score, although the GCN model was able to discriminate well between different predicted groups, the separation was inadequate between false negative (FN) and true negative (TN) groups. Conclusion The GCN model considerably outperformed all other machine learning methods and baseline CPH models. Thus, when applied to this imbalanced COVID survival dataset, adopting a population graph representation may be an approach to achieving good prediction.

Published in BMC Infectious Diseases

ISSN: 1471-2334 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Infectious and parasitic diseases
Website: https://bmcinfectdis.biomedcentral.com

About the journal

Abstract

Keywords