A Machine Learning Predictive Model of Bloodstream Infection in Hospitalized Patients
Rita Murri,
Giulia De Angelis,
Laura Antenucci,
Barbara Fiori,
Riccardo Rinaldi,
Massimo Fantoni,
Andrea Damiani,
Stefano Patarnello,
Maurizio Sanguinetti,
Vincenzo Valentini,
Brunella Posteraro,
Carlotta Masciocchi
Affiliations
Rita Murri
Dipartimento di Scienze di Laboratorio e Infettivologiche, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Giulia De Angelis
Dipartimento di Scienze di Laboratorio e Infettivologiche, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Laura Antenucci
Real World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Barbara Fiori
Dipartimento di Scienze di Laboratorio e Infettivologiche, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Riccardo Rinaldi
Real World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Massimo Fantoni
Dipartimento di Scienze di Laboratorio e Infettivologiche, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Andrea Damiani
Real World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Stefano Patarnello
Real World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Maurizio Sanguinetti
Dipartimento di Scienze di Laboratorio e Infettivologiche, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Vincenzo Valentini
Dipartimento di Diagnostica per Immagini, Radioterapia, Oncologia ed Ematologia, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
Brunella Posteraro
Dipartimento di Scienze Biotecnologiche di Base, Cliniche Intensivologiche e Perioperatorie, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
Carlotta Masciocchi
Real World Data Facility, Gemelli Generator, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
The aim of the study was to build a machine learning-based predictive model to discriminate between hospitalized patients at low risk and high risk of bloodstream infection (BSI). A Data Mart including all patients hospitalized between January 2016 and December 2019 with suspected BSI was built. Multivariate logistic regression was applied to develop a clinically interpretable machine learning predictive model. The model was trained on 2016–2018 data and tested on 2019 data. A feature selection based on a univariate logistic regression first selected candidate predictors of BSI. A multivariate logistic regression with stepwise feature selection in five-fold cross-validation was applied to express the risk of BSI. A total of 5660 hospitalizations (4026 and 1634 in the training and the validation subsets, respectively) were included. Eleven predictors of BSI were identified. The performance of the model in terms of AUROC was 0.74. Based on the interquartile predicted risk score, 508 (31.1%) patients were defined as being at low risk, 776 (47.5%) at medium risk, and 350 (21.4%) at high risk of BSI. Of them, 14.2% (72/508), 30.8% (239/776), and 64% (224/350) had a BSI, respectively. The performance of the predictive model of BSI is promising. Computational infrastructure and machine learning models can help clinicians identify people at low risk for BSI, ultimately supporting an antibiotic stewardship approach.