Journal of Statistics and Data Science Education (Jan 2025)

Teaching/learning multiple regression using historical and modern family data

  • James A. Hanley

DOI
https://doi.org/10.1080/26939169.2025.2458001

Abstract

Read online

To deal with the new concepts involved when moving up from simple to multiple regression, I have found that it helps to use readily-understood real-world datasets that involve an engaging question, measurements that students can personally relate to, such as those involving themselves and their families, and just two regressors. I describe, provide copies of, and suggest possible didactic uses of ‘two-regressor’ datasets involving family data. The late-19th century datasets, which gave rise to the very term ‘regression’, involve easily measured variables relating to students and their families, two weakly-correlated parental regressors, and a written protocol that would allow a modern version to be quickly assembled by today’s students. The recent datasets involve a less easily measured but easily understood Y variable that can be modelled within the ordinary or the Poisson (generalized) linear model regression framework, two readily obtained but very strongly-correlated parental regressors, and an engaging example of the striking difference between the regression coefficients in the two ‘1-regressor-at-a-time’ and the one ‘2-regressors-at-once’ regression models.

Keywords