Frontiers in Epidemiology (Sep 2023)

Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias

  • Elinor Curnow,
  • Elinor Curnow,
  • Kate Tilling,
  • Kate Tilling,
  • Jon E. Heron,
  • Jon E. Heron,
  • Rosie P. Cornish,
  • Rosie P. Cornish,
  • James R. Carpenter,
  • James R. Carpenter

DOI
https://doi.org/10.3389/fepid.2023.1237447
Journal volume & issue
Vol. 3

Abstract

Read online

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.

Keywords