Acta Orthopaedica (Jul 2021)

Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review

  • Olivier Q Groot,
  • Bas J J Bindels,
  • Paul T Ogink,
  • Neal D Kapoor,
  • Peter K Twining,
  • Austin K Collins,
  • Michiel E R Bongers,
  • Amanda Lans,
  • Jacobien H F Oosterhoff,
  • Aditya V Karhade,
  • Jorrit-Jan Verlaan,
  • Joseph H Schwab

DOI
https://doi.org/10.1080/17453674.2021.1910448
Journal volume & issue
Vol. 92, no. 4
pp. 385 – 393

Abstract

Read online

Background and purpose — External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines. Material and methods — We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting. Results — We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43–89), with 6 items being reported in less than 4/18 of the studies. Interpretation — Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.