Physical Review Physics Education Research (Jul 2018)

Multilevel Rasch modeling of two-tier multiple choice test: A case study using Lawson’s classroom test of scientific reasoning

  • Yang Xiao,
  • Jing Han,
  • Kathleen Koenig,
  • Jianwen Xiong,
  • Lei Bao

DOI
https://doi.org/10.1103/PhysRevPhysEducRes.14.020104
Journal volume & issue
Vol. 14, no. 2
p. 020104

Abstract

Read online Read online

Assessment instruments composed of two-tier multiple choice (TTMC) items are widely used in science education as an effective method to evaluate students’ sophisticated understanding. In practice, however, there are often concerns regarding the common scoring methods of TTMC items, which include pair scoring and individual scoring schemes. The pair-scoring method is effective in suppressing “false positives” at the cost of missing possible middle states of progression of student understanding. On the other hand, the individual scoring method captures an undistinguished middle level but is prone to rewarding guessing, which leads to “false positives”. In addition, this middle level does not discriminate the progression between knowing the result and explaining the reason, which limits the capacity of drawing meaningful implications from the assessment outcomes. To address the concerns with the current scoring methods, it is valuable to explore new scoring method(s) that can fully utilize the information measured with TTMC items. In this study, a number of scoring models are studied using Rasch analysis on data of a popular TTMC test, the Lawson classroom test of scientific reasoning (LCTSR), collected from four considerably different populations. The results show that the model fit quality of the scoring methods varies with student population and item design. In general, there is no one-fits-all solution; however, given the new information obtained in this study, a three-step process is suggested that can guide the development of new mixed scoring models tailored for a particular population and or test. The evaluation results show that the mixed models produce the most reliable model fitting and better than average goodness of fit. Furthermore, the results in this study also confirm previous studies, which suggest that it is harder to come up with a correct explanation than to just know the answer.