Journal of Causal Inference (Mar 2018)
Detecting Confounding in Multivariate Linear Models via Spectral Analysis
Abstract
We study a model where one target variable Y$Y$ is correlated with a vector X:=(X1,…,Xd)$\textbf{X}:=(X_1,\dots,X_d)$ of predictor variables being potential causes of Y$Y$. We describe a method that infers to what extent the statistical dependences between X$\textbf{X}$ and Y$Y$ are due to the influence of X$\textbf{X}$ on Y$Y$ and to what extent due to a hidden common cause (confounder) of X$\textbf{X}$ and Y$Y$. The method relies on concentration of measure results for large dimensions d$d$ and an independence assumption stating that, in the absence of confounding, the vector of regression coefficients describing the influence of each X$\textbf{X}$ on Y$Y$ typically has ‘generic orientation’ relative to the eigenspaces of the covariance matrix of X$\textbf{X}$. For the special case of a scalar confounder we show that confounding typically spoils this generic orientation in a characteristic way that can be used to quantitatively estimate the amount of confounding (subject to our idealized model assumptions).
Keywords