BMC Medical Education (Nov 2018)

Borderline grades in high stakes clinical examinations: resolving examiner uncertainty

  • Boaz Shulruf,
  • Barbara-Ann Adelstein,
  • Arvin Damodaran,
  • Peter Harris,
  • Sean Kennedy,
  • Anthony O’Sullivan,
  • Silas Taylor

DOI
https://doi.org/10.1186/s12909-018-1382-0
Journal volume & issue
Vol. 18, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background Objective Structured Clinical Exams are used to increase reliability and validity, yet they only achieve a modest level of reliability. This low reliability is due in part to examiner variance which is greater than the variance of students. This variance often represents indecisiveness at the cut score with apparent confusion over terms such as “borderline pass”. It is amplified by a well reported failure to fail. Methods A borderline grade (meaning performance is neither a clear pass nor a clear fail) was introduced in a high stakes undergraduate medical clinical skills exam to replace a borderline pass grade (which was historically resolved as 50%) in a 4 point scale (distinction, pass, borderline, fail). Each Borderline grade was then resolved into a Pass or Fail grade by a formula referencing the difficulty of the station and the performance in the same domain by the student in other stations. Raw pass or fail grades were unaltered. Mean scores and 95%CI were calculated per station and per domain for the unmodified and the modified scores/grades (results are presented on error bars). To estimate the defensibility of these modifications, similar analysis took place for the P and the F grades which resulted from the modification of the B grades. Results Of 14,634 observations 4.69% were Borderline. Application of the formula did not impact the mean scores in each domain but the failure rate for the exam increased from 0.7 to 4.1%. Examiners and students expressed satisfaction with the Borderline grade, resolution formula and outcomes. Mean scores (by stations and by domains respectively) of students whose B grades were modified to P were significantly higher than their counterparts whose B grades were modified to F. Conclusions This study provides a feasible and defensible resolution to situations where the examinee’s performance is neither a clear pass nor a clear fail, demonstrating the application of the resolution of borderline formula in a high stakes exam. It does not create a new performance standard but utilises real data to make judgements about these small number of candidates. This is perceived as a fair approach to Pass/Fail decisions.