Electronic Research Archive (Jul 2024)
Assessing agreement between permutation and dropout variable importance methods for regression and random forest models
Abstract
Permutation techniques have been used extensively in machine learning algorithms for evaluating variable importance. In ordinary regression, however, variables are often removed to gauge their importance. In this paper, we compared the results for permuting variables to removing variables in regression to assess relations between these two methods. We compared permute-and-predict (PaP) methods with leave-one-covariate-out (LOCO) techniques. We also compared these results with conventional metrics such as regression coefficient estimates, t-statistics, and random forest out-of-bag (OOB) PaP importance. Our results indicate that permutation importance metrics are practically equivalent to those obtained from removing variables in a regression setting. We demonstrate a strong association between the PaP metrics, true coefficients, and regression-estimated coefficients. We also show a strong relation between the LOCO metrics and the regression t-statistics. Finally, we illustrate that manual PaP methods are not equivalent to the OOB PaP technique and suggest prioritizing the use of manual PaP methods on validation data.
Keywords