Turkish Journal of Agriculture: Food Science and Technology (Dec 2024)
On The Pros and Cons of Using Excel for Regression Analysis
Abstract
Excel is a useful and powerful software for simple regression analysis without any programming skills and that is why, it is widely preferred by the undergraduate students from various areas such as chemistry, biology and agriculture as well as many engineering disciplines such as chemical engineering, food engineering and bioengineering. Parameter values and coefficient of determination (R2) can be easily obtained together with the graphical representation for those models exist in Excel such as linear and exponential models. It is also possible to visually examine the model fit and experimental data together on the same graph. For linear models (linear in parameters) Excel Add-In Data Analysis-Regression tool creates a summary output, and parameter estimates, parameter uncertainties, adjusted R2 (R2adj) and root mean square error (RMSE) values can be found even for the models that do not exist in Excel. For nonlinear models (nonlinear in parameters) Excel Add-In Solver tool can be used to obtain parameter estimates (but not uncertainties), and R2, R2adj and RMSE can be calculated manually. Despite these advantages, there are some shortcomings of Excel for regression analysis. For linear models with no-intercept Excel reports the incorrect and overoptimistic R2 and also reports incorrect and overpessimistic R2adj. Excel has also some nonlinear models such as exponential and power models in it; however, Excel computes linear parameter estimates for those nonlinear models and again optimistic R2 is calculated. This paper aims (i) to show these inaccuracies with their reasons by using published data, (ii) how to obtain the correct results, (iii) to warn the instructors who would use Excel for regression analysis in class. Excel users (instructors, students, professionals in any field) should be aware of the pitfalls when using Excel for regression.
Keywords