Frontiers in Psychology (Nov 2019)

Sample Size Requirements for Applying Mixed Polytomous Item Response Models: Results of a Monte Carlo Simulation Study

  • Tanja Kutscher,
  • Tanja Kutscher,
  • Michael Eid,
  • Claudia Crayen

DOI
https://doi.org/10.3389/fpsyg.2019.02494
Journal volume & issue
Vol. 10

Abstract

Read online

Mixture models of item response theory (IRT) can be used to detect inappropriate category use. Data collected by panel surveys where attitudes and traits are typically assessed by short scales with many response categories are prone to response styles indicating inappropriate category use. However, the application of mixed IRT models to this data type can be challenging because of many threshold parameters within items. Up to now, there is very limited knowledge about the sample size required for an appropriate performance of estimation methods as well as goodness-of-fit criteria of mixed IRT models in this case. The present Monte Carlo simulation study examined these issues for two mixed IRT models [the restricted mixed generalized partial credit model (rmGPCM) and the mixed partial credit model (mPCM)]. The population parameters of the simulation study were taken from a real application to survey data which is challenging (a 5-item scale with an 11-point rating scale, and three latent classes). Additional data conditions (e.g., long tests, a reduced number of response categories, and a simple latent mixture) were included in this simulation study to improve the generalizability of the results. Under this challenging data condition, for each model, data were generated based on varying sample sizes (from 500 to 5,000 observations with a 500-step). For the additional conditions, only three sample sizes (consisting of 1,000, 2,500, and 4,500 observations) were examined. The effect of sample size on estimation problems and accuracy of parameter and standard error estimates were evaluated. Results show that the two mixed IRT models require at least 2,500 observations to provide accurate parameter and standard error estimates under the challenging data condition. The rmGPCM produces more estimation problems than the more parsimonious mPCM, mostly because of the sparse tables arising due to many response categories. These models exhibit similar trends of estimation accuracy across sample sizes. Under the additional conditions, no estimation problems are observed. Both models perform well with a smaller sample size when long tests were used or a true latent mixture includes two classes. For model selection, the AIC3 and the SABIC are the most reliable information criteria.

Keywords