Automatic Classification Between COVID-19 and Non-COVID-19 Pneumonia Using Symptoms, Comorbidities, and Laboratory Findings: The Khorshid COVID Cohort Study

Hamid Reza Marateb; Farzad Ziaie Nezhad; Mohammad Reza Mohebian; Ramin Sami; Shaghayegh Haghjooy Javanmard; Fatemeh Dehghan Niri; Mahsa Akafzadeh-Savari; Marjan Mansourian; Marjan Mansourian; Miquel Angel Mañanas; Miquel Angel Mañanas; Martin Wolkewitz; Harald Binder

doi:10.3389/fmed.2021.768467

Frontiers in Medicine (Nov 2021)

Automatic Classification Between COVID-19 and Non-COVID-19 Pneumonia Using Symptoms, Comorbidities, and Laboratory Findings: The Khorshid COVID Cohort Study

Hamid Reza Marateb,
Farzad Ziaie Nezhad,
Mohammad Reza Mohebian,
Ramin Sami,
Shaghayegh Haghjooy Javanmard,
Fatemeh Dehghan Niri,
Mahsa Akafzadeh-Savari,
Marjan Mansourian,
Marjan Mansourian,
Miquel Angel Mañanas,
Miquel Angel Mañanas,
Martin Wolkewitz,
Harald Binder

Affiliations

Hamid Reza Marateb: The Biomedical Engineering Department, Engineering Faculty, University of Isfahan, Isfahan, Iran
Farzad Ziaie Nezhad: The Biomedical Engineering Department, Engineering Faculty, University of Isfahan, Isfahan, Iran
Mohammad Reza Mohebian: Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, SK, Canada
Ramin Sami: Department of Internal Medicine, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
Shaghayegh Haghjooy Javanmard: Department of Physiology, Applied Physiology Research Center, School of Medicine, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
Fatemeh Dehghan Niri: School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
Mahsa Akafzadeh-Savari: Isfahan Clinical Toxicology Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
Marjan Mansourian: Automatic Control Department (ESAII), Biomedical Engineering Research Centre (CREB), Universitat Politècnica de Catalunya-Barcelona Tech (UPC), Barcelona, Spain
Marjan Mansourian: Department of Epidemiology and Biostatistics, School of Health, Isfahan University of Medical Sciences, Isfahan, Iran
Miquel Angel Mañanas: Automatic Control Department (ESAII), Biomedical Engineering Research Centre (CREB), Universitat Politècnica de Catalunya-Barcelona Tech (UPC), Barcelona, Spain
Miquel Angel Mañanas: Biomedical Research Networking Center in Bioengineering, Biomaterials, and Nanomedicine (CIBER-BBN), Madrid, Spain
Martin Wolkewitz: 0Faculty of Medicine and Medical Center, Institute of Medical Biometry and Statistics, University of Freiburg, Freiburg, Germany
Harald Binder: 0Faculty of Medicine and Medical Center, Institute of Medical Biometry and Statistics, University of Freiburg, Freiburg, Germany

DOI: https://doi.org/10.3389/fmed.2021.768467
Journal volume & issue: Vol. 8

Abstract

Read online

Coronavirus disease-2019, also known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was a disaster in 2020. Accurate and early diagnosis of coronavirus disease-2019 (COVID-19) is still essential for health policymaking. Reverse transcriptase-polymerase chain reaction (RT-PCR) has been performed as the operational gold standard for COVID-19 diagnosis. We aimed to design and implement a reliable COVID-19 diagnosis method to provide the risk of infection using demographics, symptoms and signs, blood markers, and family history of diseases to have excellent agreement with the results obtained by the RT-PCR and CT-scan. Our study primarily used sample data from a 1-year hospital-based prospective COVID-19 open-cohort, the Khorshid COVID Cohort (KCC) study. A sample of 634 patients with COVID-19 and 118 patients with pneumonia with similar characteristics whose RT-PCR and chest CT scan were negative (as the control group) (dataset 1) was used to design the system and for internal validation. Two other online datasets, namely, some symptoms (dataset 2) and blood tests (dataset 3), were also analyzed. A combination of one-hot encoding, stability feature selection, over-sampling, and an ensemble classifier was used. Ten-fold stratified cross-validation was performed. In addition to gender and symptom duration, signs and symptoms, blood biomarkers, and comorbidities were selected. Performance indices of the cross-validated confusion matrix for dataset 1 were as follows: sensitivity of 96% [confidence interval, CI, 95%: 94–98], specificity of 95% [90–99], positive predictive value (PPV) of 99% [98–100], negative predictive value (NPV) of 82% [76–89], diagnostic odds ratio (DOR) of 496 [198–1,245], area under the ROC (AUC) of 0.96 [0.94–0.97], Matthews Correlation Coefficient (MCC) of 0.87 [0.85–0.88], accuracy of 96% [94–98], and Cohen's Kappa of 0.86 [0.81–0.91]. The proposed algorithm showed excellent diagnosis accuracy and class-labeling agreement, and fair discriminant power. The AUC on the datasets 2 and 3 was 0.97 [0.96–0.98] and 0.92 [0.91–0.94], respectively. The most important feature was white blood cell count, shortness of breath, and C-reactive protein for datasets 1, 2, and 3, respectively. The proposed algorithm is, thus, a promising COVID-19 diagnosis method, which could be an amendment to simple blood tests and screening of symptoms. However, the RT-PCR and chest CT-scan, performed as the gold standard, are not 100% accurate.

Published in Frontiers in Medicine

ISSN: 2296-858X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Medicine (General)
Website: http://www.frontiersin.org/journals/medicine

About the journal

Abstract

Keywords