Humanities & Social Sciences Communications (Nov 2023)

Rater variability and reliability of constructed response questions in New York state high-stakes tests of English language arts and mathematics: implications for educational assessment policy

  • Jinyan Huang,
  • Patrick B. Whipple

DOI
https://doi.org/10.1057/s41599-023-02385-4
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Using generalizability (G-) theory as a theoretical framework and research methodology, this study examined the impact of the current one-rater holistic scoring practice on the rater variability and reliability of constructed response questions in New York State high-stakes tests of grades four and six English language arts (ELA) and grades four and five mathematics assessments. Following the New York State scoring rubrics, a total of 36 grades four and six ELA constructed response samples and 72 grades four and five mathematics constructed response samples were marked holistically by ten independent raters having current certifications as educators in the State of New York. The results indicated that the current single-rater holistic scoring practice would not be able to yield acceptable G-coefficients for the New York State grades four and six ELA and grades four and five mathematics assessments. Implications for assessment policy making at the local and state levels are discussed.