Frontiers in Medicine (Nov 2021)

Machine Learning Approach to Predict Positive Screening of Methicillin-Resistant Staphylococcus aureus During Mechanical Ventilation Using Synthetic Dataset From MIMIC-IV Database

  • Yohei Hirano,
  • Keito Shinmoto,
  • Yohei Okada,
  • Kazuhiro Suga,
  • Jeffrey Bombard,
  • Shogo Murahata,
  • Manoj Shrestha,
  • Patrick Ocheja,
  • Aiko Tanaka

DOI
https://doi.org/10.3389/fmed.2021.694520
Journal volume & issue
Vol. 8

Abstract

Read online

Background: Mechanically ventilated patients are susceptible to nosocomial infections such as ventilator-associated pneumonia. To treat ventilated patients with suspected infection, clinicians select appropriate antibiotics. However, decision-making regarding the use of antibiotics for methicillin-resistant Staphylococcus aureus (MRSA) is challenging, because of the lack of evidence-supported criteria. This study aims to derive a machine learning model to predict MRSA as a possible pathogen responsible for infection in mechanically ventilated patients.Methods: Data were collected from the Medical Information Mart for Intensive Care (MIMIC)-IV database (an openly available database of patients treated at the Beth Israel Deaconess Medical Center in the period 2008–2019). Of 26,409 mechanically ventilated patients, 809 were screened for MRSA during the mechanical ventilation period and included in the study. The outcome was positivity to MRSA on screening, which was highly imbalanced in the dataset, with 93.9% positive outcomes. Therefore, after dividing the dataset into a training set (n = 566) and a test set (n = 243) for validation by stratified random sampling with a 7:3 allocation ratio, synthetic datasets with 50% positive outcomes were created by synthetic minority over-sampling for both sets individually (synthetic training set: n = 1,064; synthetic test set: n = 456). Using these synthetic datasets, we trained and validated an XGBoost machine learning model using 28 predictor variables for outcome prediction. Model performance was evaluated by area under the receiver operating characteristic (AUROC), sensitivity, specificity, and other statistical measurements. Feature importance was computed by the Gini method.Results: In validation, the XGBoost model demonstrated reliable outcome prediction with an AUROC value of 0.89 [95% confidence interval (CI): 0.83–0.95]. The model showed a high sensitivity of 0.98 [CI: 0.95–0.99], but a low specificity of 0.47 [CI: 0.41–0.54] and a positive predictive value of 0.65 [CI: 0.62–0.68]. Important predictor variables included admission from the emergency department, insertion of arterial lines, prior quinolone use, hemodialysis, and admission to a surgical intensive care unit.Conclusions: We were able to develop an effective machine learning model to predict positive MRSA screening during mechanical ventilation using synthetic datasets, thus encouraging further research to develop a clinically relevant machine learning model for antibiotics stewardship.

Keywords