Scientific Reports (Apr 2022)
Machine learning model from a Spanish cohort for prediction of SARS-COV-2 mortality risk and critical patients
Abstract
Abstract Patients affected by SARS-COV-2 have collapsed healthcare systems around the world. Consequently, different challenges arise regarding the prediction of hospital needs, optimization of resources, diagnostic triage tools and patient evolution, as well as tools that allow us to analyze which are the factors that determine the severity of patients. Currently, it is widely accepted that one of the problems since the pandemic appeared was to detect (i) who patients were about to need Intensive Care Unit (ICU) and (ii) who ones were about not overcome the disease. These critical patients collapsed Hospitals to the point that many surgeries around the world had to be cancelled. Therefore, the aim of this paper is to provide a Machine Learning (ML) model that helps us to prevent when a patient is about to be critical. Although we are in the era of data, regarding the SARS-COV-2 patients, there are currently few tools and solutions that help medical professionals to predict the evolution of patients in order to improve their treatment and the needs of critical resources at hospitals. Moreover, most of these tools have been created from small populations and/or Chinese populations, which carries a high risk of bias. In this paper, we present a model, based on ML techniques, based on 5378 Spanish patients’ data from which a quality cohort of 1201 was extracted to train the model. Our model is capable of predicting the probability of death of patients with SARS-COV-2 based on age, sex and comorbidities of the patient. It also allows what-if analysis, with the inclusion of comorbidities that the patient may develop during the SARS-COV-2 infection. For the training of the model, we have followed an agnostic approach. We explored all the active comorbidities during the SARS-COV-2 infection of the patients with the objective that the model weights the effect of each comorbidity on the patient’s evolution according to the data available. The model has been validated by using stratified cross-validation with k = 5 to prevent class imbalance. We obtained robust results, presenting a high hit rate, with 84.16% accuracy, 83.33% sensitivity, and an Area Under the Curve (AUC) of 0.871. The main advantage of our model, in addition to its high success rate, is that it can be used with medical records in order to predict their diagnosis, allowing the critical population to be identified in advance. Furthermore, it uses the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD 9-CM) standard. In this sense, we should also emphasize that those hospitals using other encodings can add an intermediate layer business to business (B2B) with the aim of making transformations to the same international format.