Applied Computing and Geosciences (Jun 2020)
Partial correlations in compositional data analysis
Abstract
Partial correlations quantify linear association between two variables while adjusting for the influence of the remaining variables. They form the backbone for graphical models and are readily obtained from the inverse of the covariance matrix. For compositional data, the covariance structure is specified from log ratios of variables, which implies changes in the definition and interpretation of partial correlations. In the present work, we elucidate how results derived by Aitchison (1986) lead to a natural definition of partial correlation that has a number of advantages over current measures of association. For this, we show that the residuals of log-ratios between a variable with a reference, when adjusting for all remaining variables including the reference, are reference-independent. Since the reference itself can be controlled for, correlations between residuals are defined for the variables directly without the necessity to explicitly specify the reference (as it is implicit in the variables that are partialled out). Thus, perhaps surprisingly, partial correlations do not have the problems commonly found with measures of pairwise association on compositional data. They are well-defined between two variables, are properly scaled, and allow for negative association. By design, they are subcompositionally incoherent, but they share this property with conventional partial correlations (where results change when adjusting for the influence of fewer variables). We discuss the reference-dependence of the multiple correlation coefficient as well as partial correlations that are obtained after effective library-size normalizations. We also determine the partial variances and correlations on two previously studied data sets and compare with symmetric balance correlations and the proportionality coefficient.