Education Sciences (Mar 2024)

More Evidence of Low Inter-Rater Reliability of a High-Stakes Performance Assessment of Teacher Candidates

  • Scott A. Lyness

DOI
https://doi.org/10.3390/educsci14030300
Journal volume & issue
Vol. 14, no. 3
p. 300

Abstract

Read online

From 2010 to 2015, our school of education used the Performance Assessment for California Teachers (PACT), a summative assessment designed to assess preservice teacher competence. Candidate portfolios were uploaded to an evaluation portal, and trained evaluators assigned a final score of Pass or Fail to the work samples. Three consensus estimates of inter-rater reliability of 181 candidate portfolios that were either double- or triple-scored were computed. Two chance-corrected estimates of inter-rater reliability (Cohen’s kappa and Gwet’s AC1) and percent agreement were computed and calculated within five content areas: elementary math, secondary history/social science, math, science, and English language arts. An initial Pass or Fail score was not more likely to be followed by either a Pass or Fail score given by a subsequent evaluator. Inter-rater reliability was interpreted as being low across all content areas that were examined. None of the percent agreement coefficients attained the minimum standard of 0.700 for consensus agreement. Increasing research access to proprietary double-scored data would lead to an increased understanding of, and perhaps improvement in, teacher performance assessments.

Keywords