Scientific Reports (Oct 2024)
Multiple imputation integrated to machine learning: predicting post-stroke recovery of ambulation after intensive inpatient rehabilitation
Abstract
Abstract Good data quality is vital for personalising plans in rehabilitation. Machine learning (ML) improves prognostics but integrating it with Multiple Imputation (MImp) for dealing missingness is an unexplored field. This work aims to provide post-stroke ambulation prognosis, integrating MImp with ML, and identify the prognostic influential factors. Stroke survivors in intensive rehabilitation were enrolled. Data on demographics, events, clinical, physiotherapy, and psycho-social assessment were collected. An independent ambulation at discharge, using the Functional Ambulation Category scale, was the outcome. After handling missingness using MImp, ML models were optimised, cross-validated, and tested. Interpretability techniques analysed predictor contributions. Pre-MImp, the dataset included 54.1% women, 79.2% ischaemic patients, median age 80.0 (interquartile range: 15.0). Post-MImp, 368 non-ambulatory patients on 10 imputed datasets were used for training, 80 for testing. The random forest (the validation best-performing algorithm) obtained 75.5% aggregated balanced accuracy on the test set. The main predictors included modified Barthel index, Fugl-Meyer assessment/motricity index, short physical performance battery, age, Charlson comorbidity index/cumulative illness rating scale, and trunk control test. This is among the first studies applying ML, together with MImp, to predict ambulation recovery in post-stroke rehabilitation. This pipeline reliably exploits the potential of incomplete datasets for healthcare prognosis, identifying relevant predictors.
Keywords