Stats (Apr 2023)

Model Selection with Missing Data Embedded in Missing-at-Random Data

  • Keiji Takai,
  • Kenichi Hayashi

DOI
https://doi.org/10.3390/stats6020031
Journal volume & issue
Vol. 6, no. 2
pp. 495 – 505

Abstract

Read online

When models are built with missing data, an information criterion is needed to select the best model among the various candidates. Using a conventional information criterion for missing data may lead to the selection of the wrong model when data are not missing at random. Conventional information criteria implicitly assume that any subset of missing-at-random data is also missing at random, and thus the maximum likelihood estimator is assumed to be consistent; that is, it is assumed that the estimator will converge to the true value. However, this assumption may not be practical. In this paper, we develop an information criterion that works even for not-missing-at-random data, so long as the largest missing data set is missing at random. Simulations are performed to show the superiority of the proposed information criterion over conventional criteria.

Keywords