Tutorials in Quantitative Methods for Psychology (Mar 2024)

Identifying Influential Observations in Multiple Regression

  • Camilleri, Carmel,
  • Alter, Udi,
  • Cribbie, Robert A.

DOI
https://doi.org/10.20982/tqmp.20.2.p096
Journal volume & issue
Vol. 20, no. 2
pp. 96 – 105

Abstract

Read online

Linear models are particularly vulnerable to influential observations which disproportionately affect the model's parameter estimates. Multiple statistics and numerous cut-off values have been proposed to detect highly influential observations including Cook’s Distance (CD), Standardized Difference of Fits (DFFITS) and Standardized Difference of Beta (DFBETAS). This paper reports on a Monte Carlo simulation study that assesses the effectiveness of these methods and recommended cut-off values under various conditions, including different sample sizes, numbers of predictors, strengths of variable associations, and non-sequential versus sequential analysis approaches within a multiple linear regression framework. The findings suggest that the proportion of observations identified as highly influential varies significantly based on the chosen diagnostic method and the thresholds used for detection. Consequently, researchers should consider the implications of their methodological choices and the thresholds they apply when identifying influential data points.

Keywords