Large-scale Assessments in Education (Apr 2021)

Comparing different response time threshold setting methods to detect low effort on a large-scale assessment

  • James Soland,
  • Megan Kuhfeld,
  • Joseph Rios

DOI
https://doi.org/10.1186/s40536-021-00100-w
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Low examinee effort is a major threat to valid uses of many test scores. Fortunately, several methods have been developed to detect noneffortful item responses, most of which use response times. To accurately identify noneffortful responses, one must set response time thresholds separating those responses from effortful ones. While other studies have compared the efficacy of different threshold-setting methods, they typically do so using simulated or small-scale data. When large-scale data are used in such studies, they often are not from a computer-adaptive test (CAT), use only a handful of items, or do not comprehensively examine different threshold-setting methods. In this study, we use reading test scores from over 728,923 3rd–8th-grade students in 2056 schools across the United States taking a CAT consisting of nearly 12,000 items to compare threshold-setting methods. In so doing, we help provide guidance to developers and administrators of large-scale assessments on the tradeoffs involved in using a given method to identify noneffortful responses.

Keywords