Heliyon (Mar 2024)
Machine learning for the early prediction of acute respiratory distress syndrome (ARDS) in patients with sepsis in the ICU based on clinical data
Abstract
Background: Acute respiratory distress syndrome (ARDS) is a fatal outcome of severe sepsis. Machine learning models are helpful for accurately predicting ARDS in patients with sepsis at an early stage. Objective: We aim to develop a machine-learning model for predicting ARDS in patients with sepsis in the intensive care unit (ICU). Methods: The initial clinical data of patients with sepsis admitted to the hospital (including population characteristics, clinical diagnosis, complications, and laboratory tests) were used to predict ARDS, and screen out the crucial variables. After comparing eight different algorithms, namely, XG boost, logistic regression, light GBM, random forest, Gaussian NB, complement NB, support vector machine (SVM), and K nearest neighbors (KNN), rebuilding a prediction model with the best one. When remodeling with the best algorithm, 10% was randomly selected to test, and the remaining was trained for cross-validation. Using the area under the curve (AUC), sensitivity, accuracy, specificity, positive and negative predictive value, F1 score, kappa value, and clinical decision curve to evaluate the model's performance. Eventually, the application in the model illustrated by the SHAP package. Results: Ten critical features were screened utilizing the lasso method, namely, PaO2/PAO2, A-aDO2, PO2(T), CRP, gender, PO2, RDW, MCH, SG, and chlorine. The prior ranking of variables demonstrated that PaO2/PAO2 was the most significant variable. Among the eight algorithms, the performance of the Gaussian NB algorithm was significantly better than that of the others. After remodeling with the best algorithm, the AUC in the training and validation sets were 0.777 and 0.770, respectively, and the algorithm performed well in the test set (AUC = 0.781, accuracy = 78.6%, sensitivity = 82.4%, F1 score = 0.824). A comparison of the overlap factors with those of previous models revealed that the model we developed performs better. Conclusion: Sepsis-associated ARDS can be accurately predicted early via a machine learning model based on existing clinical data. These findings are helpful for accurate identification and improvement of the prognosis in patients with sepsis-associated ARDS.