Econometrics (Aug 2024)

Is It Sufficient to Select the Optimal Class Number Based Only on Information Criteria in Fixed- and Random-Parameter Latent Class Discrete Choice Modeling Approaches?

  • Péter Czine,
  • Péter Balogh,
  • Zsanett Blága,
  • Zoltán Szabó,
  • Réka Szekeres,
  • Stephane Hess,
  • Béla Juhász

DOI
https://doi.org/10.3390/econometrics12030022
Journal volume & issue
Vol. 12, no. 3
p. 22

Abstract

Read online

Heterogeneity in preferences can be addressed through various discrete choice modeling approaches. The random-parameter latent class (RLC) approach offers a desirable alternative for analysts due to its advantageous properties of separating classes with different preferences and capturing the remaining heterogeneity within classes by including random parameters. For latent class specifications, however, more empirical evidence on the optimal number of classes to consider is needed in order to develop a more objective set of criteria. To investigate this question, we tested cases with different class numbers (for both fixed- and random-parameter latent class modeling) by analyzing data from a discrete choice experiment conducted in 2021 (examined preferences regarding COVID-19 vaccines). We compared models using commonly used indicators such as the Bayesian information criterion, and we took into account, among others, a seemingly simple but often overlooked indicator such as the ratio of significant parameter estimates. Based on our results, it is not sufficient to decide on the optimal number of classes in the latent class modeling based on only information criteria. We considered aspects such as the ratio of significant parameter estimates (it may be interesting to examine this both between and within specifications to find out which model type and class number has the most balanced ratio); the validity of the coefficients obtained (focusing on whether the conclusions are consistent with our theoretical model); whether including random parameters is justified (finding a balance between the complexity of the model and its information content, i.e., to examine when (and to what extent) the introduction of within-class heterogeneity is relevant); and the distributions of MRS calculations (since they often function as a direct measure of preferences, it is necessary to test how consistent the distributions of specifications with different class numbers are (if they are highly, i.e., relatively stable in explaining consumer preferences, it is probably worth putting more emphasis on the aspects mentioned above when choosing a model)). The results of this research raise further questions that should be addressed by further model testing in the future.

Keywords