Journal of Medical Education and Curricular Development (Jul 2019)

The Reliability of 2-Station Clerkship Objective Structured Clinical Examinations in Isolation and in Aggregate

  • Aaron W Bernard,
  • Richard Feinn,
  • Gabbriel Ceccolini,
  • Robert Brown,
  • Ilene Rosenberg,
  • Walter Trymbulak,
  • Christine VanCott

DOI
https://doi.org/10.1177/2382120519863443
Journal volume & issue
Vol. 6

Abstract

Read online

Background: Most medical schools in the United States report having a 5- to 10-station objective structured clinical examination (OSCE) at the end of the core clerkship phase of the curriculum to assess clinical skills. We set out to investigate an alternative OSCE structure in which each clerkship has a 2-station OSCE. This study looked to determine the reliability of clerkship OSCEs in isolation to inform composite clerkship grading, as well as the reliability in aggregate, as a potential alternative to an end-of-third-year examination. Design: Clerkship OSCE data from the 2017-2018 academic year were analyzed: the generalizability coefficient (ρ 2 ) and index of dependability (φ) were calculated for clerkships in isolation and in aggregate using variance components analysis. Results: In all, 93 students completed all examinations. The average generalizability coefficient for the individual clerkships was .47. Most often, the largest variance component was the interaction between the student and the station, indicating inconsistency in the performance of students between the 2 stations. Aggregate clerkship OSCE analysis demonstrated good reliability for consistency (ρ 2 = .80). About one-third (33.8%) of the variance can be attributed to students, 8.2% can be attributed to the student by clerkship interaction, and 42.6% can be attributed to the student by block interaction, indicating that students’ relative performances varied by block. Conclusions: Two-station clerkship OSCEs have poor to fair reliability, and this should inform the weighting of the composite clerkship grade. Aggregating data results in good reliability. The largest source of variance in the aggregate was student by block, suggesting testing over several blocks may have advantages compared with a single day examination.