BMC Medical Education (Apr 2021)

Reliability of simulation-based assessment for practicing physicians: performance is context-specific

  • Elizabeth Sinz,
  • Arna Banerjee,
  • Randolph Steadman,
  • Matthew S. Shotwell,
  • Jason Slagle,
  • William R. McIvor,
  • Laurence Torsher,
  • Amanda Burden,
  • Jeffrey B. Cooper,
  • Samuel DeMaria,
  • Adam I. Levine,
  • Christine Park,
  • David M. Gaba,
  • Matthew B. Weinger,
  • John R. Boulet

DOI
https://doi.org/10.1186/s12909-021-02617-8
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Introduction Even physicians who routinely work in complex, dynamic practices may be unprepared to optimally manage challenging critical events. High-fidelity simulation can realistically mimic critical clinically relevant events, however the reliability and validity of simulation-based assessment scores for practicing physicians has not been established. Methods Standardised complex simulation scenarios were developed and administered to board-certified, practicing anesthesiologists who volunteered to participate in an assessment study during formative maintenance of certification activities. A subset of the study population agreed to participate as the primary responder in a second scenario for this study. The physicians were assessed independently by trained raters on both teamwork/behavioural and technical performance measures. Analysis using Generalisability and Decision studies were completed for the two scenarios with two raters. Results The behavioural score was not more reliable than the technical score. With two raters > 20 scenarios would be required to achieve a reliability estimate of 0.7. Increasing the number of raters for a given scenario would have little effect on reliability. Conclusions The performance of practicing physicians on simulated critical events may be highly context-specific. Realistic simulation-based assessment for practicing physicians is resource-intensive and may be best-suited for individualized formative feedback. More importantly, aggregate data from a population of participants may have an even higher impact if used to identify skill or knowledge gaps to be addressed by training programs and inform continuing education improvements across the profession.

Keywords