Multiple imputation integrated to machine learning: predicting post-stroke recovery of ambulation after intensive inpatient rehabilitation

Alice Finocchi; Silvia Campagnini; Andrea Mannini; Stefano Doronzio; Marco Baccini; Bahia Hakiki; Donata Bardi; Antonello Grippo; Claudio Macchi; Jorge Navarro Solano; Michela Baccini; Francesca Cecchi

doi:10.1038/s41598-024-74537-8

Scientific Reports (Oct 2024)

Multiple imputation integrated to machine learning: predicting post-stroke recovery of ambulation after intensive inpatient rehabilitation

Alice Finocchi,
Silvia Campagnini,
Andrea Mannini,
Stefano Doronzio,
Marco Baccini,
Bahia Hakiki,
Donata Bardi,
Antonello Grippo,
Claudio Macchi,
Jorge Navarro Solano,
Michela Baccini,
Francesca Cecchi

Affiliations

Alice Finocchi: IRCCS Fondazione Don Carlo Gnocchi onlus
Silvia Campagnini: IRCCS Fondazione Don Carlo Gnocchi onlus
Andrea Mannini: IRCCS Fondazione Don Carlo Gnocchi onlus
Stefano Doronzio: IRCCS Fondazione Don Carlo Gnocchi onlus
Marco Baccini: IRCCS Fondazione Don Carlo Gnocchi onlus
Bahia Hakiki: IRCCS Fondazione Don Carlo Gnocchi onlus
Donata Bardi: IRCCS Fondazione Don Carlo Gnocchi onlus
Antonello Grippo: IRCCS Fondazione Don Carlo Gnocchi onlus
Claudio Macchi: IRCCS Fondazione Don Carlo Gnocchi onlus
Jorge Navarro Solano: IRCCS Fondazione Don Carlo Gnocchi onlus
Michela Baccini: Department of Statistics, Computer Science, Applications, University of Florence
Francesca Cecchi: IRCCS Fondazione Don Carlo Gnocchi onlus

DOI: https://doi.org/10.1038/s41598-024-74537-8
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Good data quality is vital for personalising plans in rehabilitation. Machine learning (ML) improves prognostics but integrating it with Multiple Imputation (MImp) for dealing missingness is an unexplored field. This work aims to provide post-stroke ambulation prognosis, integrating MImp with ML, and identify the prognostic influential factors. Stroke survivors in intensive rehabilitation were enrolled. Data on demographics, events, clinical, physiotherapy, and psycho-social assessment were collected. An independent ambulation at discharge, using the Functional Ambulation Category scale, was the outcome. After handling missingness using MImp, ML models were optimised, cross-validated, and tested. Interpretability techniques analysed predictor contributions. Pre-MImp, the dataset included 54.1% women, 79.2% ischaemic patients, median age 80.0 (interquartile range: 15.0). Post-MImp, 368 non-ambulatory patients on 10 imputed datasets were used for training, 80 for testing. The random forest (the validation best-performing algorithm) obtained 75.5% aggregated balanced accuracy on the test set. The main predictors included modified Barthel index, Fugl-Meyer assessment/motricity index, short physical performance battery, age, Charlson comorbidity index/cumulative illness rating scale, and trunk control test. This is among the first studies applying ML, together with MImp, to predict ambulation recovery in post-stroke rehabilitation. This pipeline reliably exploits the potential of incomplete datasets for healthcare prognosis, identifying relevant predictors.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords