Research in Statistics (Dec 2025)
Enhancing imputation accuracy for catch-all missing data mechanisms with DFBETAS and leverage
Abstract
This paper addresses the challenge of missing data in scientific research. It specifically examines the case of missing data arising from a “catch-all” missing not at ran (MNAR) mechanism, where missing values are disproportionately from one category, such as income or ethnicity in surveys. The study introduces the use of the regression diagnostic DFBETAS along with Leverage to improve the imputation of categorical data under such conditions. DFBETAS, a measure of influence in regression, is adapted to capture the intrinsic information of missing values, thereby enhancing the imputation process within a Bayesian multiple imputation (MI) framework. We validate the proposed approach through Monte Carlo simulations with data generating mechanisms based on probability distributions. The results show that incorporating DFBETAS and Leverage significantly improves the accuracy of imputations, optimizes the balance between its sensitivity and specificity reduces bias, and enhances confidence interval coverage of imputed estimates, especially as the strength of the catch-all mechanism increases. The study demonstrates that MI with DFBETAS and Leverage outperforms standard MI methods, offering a robust solution for handling categorical data with catch-all MNAR mechanisms. This advancement in imputation methodology provides a more accurate and efficient means of dealing with missing data in various research fields.
Keywords