IEEE Access (Jan 2024)
The Effects of Data Imputation on Covariance and Inverse Covariance Matrix Estimation
Abstract
Various data analysis techniques and procedures (correlation heatmap, linear discriminant analysis, quadratic discriminant analysis) rely on the estimation of the covariance matrix or its inverse (the precision matrix). However, missing data can pose significant challenges to this parameter estimation problem. When missing data is presented, imputation is a common way to circumvent the issue as it renders the data complete. Nevertheless, it is imperative to scrutinize the potential trade-offs when opting for imputation as opposed to task-specific methods for handling missing data, especially in the context of subsequent data analysis and inference. In this study, we undertake both empirical and theoretical investigations to assess the impact of imputation in contrast to direct parameter estimation approaches. We focus on the task of estimating the covariance matrix and precision matrix and present an analysis of the error induced by estimating the precision matrix by the inverse of an estimated covariance matrix. Additionally, we propose a sufficient condition that ensures improved performance guarantees for precision matrix estimation based on covariance matrix estimation. The experimental results show that when the number of features is small, direct parameter estimations can be recommended to estimate the precision matrix by inverting the corresponding estimated covariance matrix. However, when the number of features is not small, then inverting the covariance matrix of imputed data gives better results.
Keywords