Journal of Statistics and Data Science Education (Apr 2025)
Developing Students’ Statistical Expertise through Writing in the Age of AI
Abstract
As large language models (LLMs) such as GPT have become more accessible, concerns about their potential effects on students’ learning have grown. In data science education, the specter of students’ turning to LLMs raises multiple issues, as writing is a means not just of conveying information but of developing their statistical reasoning. In our study, we engage with questions surrounding LLMs and their pedagogical impact by: 1) quantitatively and qualitatively describing how select LLMs write report introductions and complete data analysis reports; and 2) comparing patterns in texts authored by LLMs to those authored by students and by published researchers. Our results show distinct differences between machine-generated and human-generated writing, as well as between novice and expert writing. Those differences are evident in how writers manage information, modulate confidence, signal importance, and report statistics. The findings can help inform classroom instruction, whether that instruction is aimed at dissuading the use LLMs or at guiding their use as a productivity tool. It also has implications for students’ development as statistical thinkers and writers. What happens when they offload the work of data science to a model that doesn’t write quite like a data scientist?
Keywords