Physical Review Physics Education Research (Mar 2021)
Optimizing the length of computerized adaptive testing for the Force Concept Inventory
Abstract
As a method to shorten the test time of the Force Concept Inventory (FCI), we suggest the use of computerized adaptive testing (CAT). CAT is the process of administering a test on a computer, with items (i.e., questions) selected based upon the responses of the examinee to prior items. In so doing, the test length can be significantly shortened. As a step to develop a CAT-based version of the FCI (FCI-CAT), we examined the optimal test length of the FCI-CAT such that accuracy and precision [which we measure in terms of bias, standard error, and root-mean-square error (RMSE)] of Cohen’s d would be comparable to that of the full FCI for a given class size. First, we estimated the item parameters of the FCI items based on the three-parameter logistic model of item response theory, which are used in the algorithm of CAT. For this estimation, we used 2882 responses of Japanese university students. Second, we conducted a Monte Carlo simulation to analyze how the bias, standard error, and RMSE of Cohen’s d depend upon the test length. Third, we conducted a post hoc simulation to examine the consistency of the Monte Carlo results with what would have been obtained using empirical responses. For this comparison, we used 86 pairs of pre- and post- test responses of Japanese university students. As a result, we found that for a class size of 40, we may reduce the test length of the FCI-CAT to 15–19 items, thereby reducing the test time of the FCI to 50%–63%, with an accompanying decrease in accuracy and precision of only 5%–10%. The results of the Monte Carlo study and the post hoc simulation were consistent, which supports the adequacy of our Monte Carlo study and its relevance in terms of administering the FCI-CAT in real classrooms.