Language Testing in Asia (Feb 2020)

Analyzing rater severity in a freshman composition course using many facet Rasch measurement

  • Inan Deniz Erguvan,
  • Beyza Aksu Dunya

DOI
https://doi.org/10.1186/s40468-020-0098-3
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 20

Abstract

Read online

Abstract This study examined the rater severity of instructors using a multi-trait rubric in a freshman composition course offered in a private university in Kuwait. Use of standardized multi-trait rubrics is a recent development in this course and student feedback and anchor papers provided by instructors for each essay exam necessitated the assessment of rater effects, including severity/leniency and restriction of range in ratings among instructors. Data were collected from three instructors teaching the same course in Summer 2019, who rated the first midterm exam essays of their students and shared the scores with the researcher. Also, two students from each class were randomly selected and a total of six papers were marked by all instructors for anchoring purposes. Many-facet Rasch model (MFRM) was employed for data analysis. The results showed that although the raters used the rubric consistently during scoring across all examinees and tasks, they differed in their degree of leniency and severity, and tended to assign scores of 70 and 80 more frequently than the other scores. The study shows that composition instructors may differ in their rating behavior and this may cause dissatisfaction, creating a sense of unfairness among the students of severe instructors. The findings of this study are expected to help writing departments to monitor their inter-rater reliability and consistency in their ratings. The most practical way to achieve this is by organizing rater training workshops.

Keywords