Informatics in Medicine Unlocked (Jan 2023)
Construction of a nomogram for predicting COVID-19 in-hospital mortality: A machine learning analysis
Abstract
Background and objectives: We aim to verify the use of ML algorithms to predict patient outcome using a relatively small dataset and to create a nomogram to assess in-hospital mortality of patients with COVID-19. Methods: A database of 200 COVID-19 patients admitted to the Clinical Hospital of State University of Campinas (UNICAMP) was used in this analysis. Patient features were divided into three categories: clinical, chest abnormalities, and body composition characteristics acquired by computerized tomography. These features were evaluated independently and combined to predict patient outcomes. To minimize performance fluctuations due to low sample number, reduce possible bias related to outliers, and evaluate the uncertainties generated by the small dataset, we developed a shuffling technique, a modified version of the Monte Carlo Cross Validation, creating several subgroups for training the algorithm and complementary testing subgroups. The following ML algorithms were tested: random forest, boosted decision trees, logistic regression, support vector machines, and neural networks. Performance was evaluated by analyzing Receiver operating characteristic (ROC) curves. The importance of each feature in the determination of the outcome predictability was also studied and a nomogram was created based on the most important features selected by the exclusion test. Results: Among the different sets of features, clinical variables age, lymphocyte number and weight were the most valuable features for prognosis prediction. However, we observed that skeletal muscle radiodensity and presence of pleural effusion were also important for outcome determination. Integrating these independent predictors was successfully developed to accurately predict mortality in COVID-19 in hospital patients. A nomogram based on these five features was created to predict COVID-19 mortality in hospitalized patients. The area under the ROC curve was 0.86 ± 0.04. Conclusion: ML algorithms can be reliable for the prediction of COVID-19-related in-hospital mortality, even when using a relatively small dataset. The success of ML techniques in smaller datasets broadens the applicability of these methods in several problems in the medical area. In addition, feature importance analysis allowed us to determine the most important variables for the prediction tasks resulting in a nomogram with good accuracy and clinical utility in predicting COVID-19 in-hospital mortality.