Journal of Statistics and Data Science Education (Jan 2025)
Teaching/learning multiple regression using historical and modern family data
Abstract
To deal with the new concepts involved when moving up from simple to multiple regression, I have found that it helps to use readily-understood real-world datasets that involve an engaging question, measurements that students can personally relate to, such as those involving themselves and their families, and just two regressors. I describe, provide copies of, and suggest possible didactic uses of ‘two-regressor’ datasets involving family data. The late-19th century datasets, which gave rise to the very term ‘regression’, involve easily measured variables relating to students and their families, two weakly-correlated parental regressors, and a written protocol that would allow a modern version to be quickly assembled by today’s students. The recent datasets involve a less easily measured but easily understood Y variable that can be modelled within the ordinary or the Poisson (generalized) linear model regression framework, two readily obtained but very strongly-correlated parental regressors, and an engaging example of the striking difference between the regression coefficients in the two ‘1-regressor-at-a-time’ and the one ‘2-regressors-at-once’ regression models.
Keywords