Journal of Statistics and Data Science Education (Oct 2024)

Teaching Modeling in Introductory Statistics: A Comparison of Formula and Tidyverse Syntaxes

  • Amelia McNamara

DOI
https://doi.org/10.1080/26939169.2024.2394545
Journal volume & issue
Vol. 32, no. 4
pp. 374 – 394

Abstract

Read online

When incorporating programming into a statistics course, there are many pedagogical considerations. In R, one consideration is the particular R syntax used. This article reports on a head-to-head comparison of a pair of introductory statistics labs, one conducted in the formula syntax, the other in tidyverse. Pre- and post-surveys show minimal differences between labs, with students reporting a positive experience regardless of section. Analysis of YouTube and RStudio Cloud data shows interesting distinctions. The formula section appeared to watch a larger proportion of pre-lab videos, but spend less time computing. Conversely, the tidyverse section watched a smaller proportion of videos and spent more time computing. The tidyverse labs were slightly longer in terms of lines of code and minutes of videos. The tidyverse labs exposed students to more distinct R functions, but reused functions more frequently. Both labs relied on a relatively small vocabulary of functions, which can provide a starting point for instructors interested in teaching introductory statistics in R. The instructor experience of teaching the two syntaxes diverged when discussing relationships between categorical variables, as well as working with numeric variable summary statistics. This work provides additional evidence for instructors looking to choose between syntaxes for introductory statistics.

Keywords