Physical Review Physics Education Research (Jul 2019)
Longitudinal predictions using regression-corrected grouping to reduce regression to the mean
Abstract
[This paper is part of the Focused Collection on Quantitative Methods in PER: A Critical Examination.] The education of physicists requires annual progress in mathematics through grades K–12, continuing on to high school physics, undergraduate instruction, and often many years of postgraduate study. The traditional domain of physics education research is the study of pedagogical techniques for a single lesson or a single course, but the physics and physics education research communities will need to study the complete educational system to increase performance and participation, especially for underrepresented groups. Analysis of long-term student outcomes and policy impacts requires predictive longitudinal techniques. We begin with concepts from fluid and statistical mechanics—trajectories and streamlines—to visualize longitudinal outcomes and make predictions. Drawing on coarse-graining procedures to model the movement of particles in a fluid, the longitudinal data are sorted into score bins to depict the flow of scores through time: trajectories depict the average scores over time for initial score bins and streamlines provide an approximate way to calculate the flow of student scores over many years based on only two consecutive years of data. However, due to the partially stochastic nature of observed scores, the coarse-graining procedure that sorts students into score bins amplifies a statistical phenomenon known as regression to the mean (RTM). As a result, streamlines do not provide an accurate prediction for the future performance of students. Here we discuss a new coarse-graining procedure, regression-corrected (RC) grouping, which reduces RTM in the streamlines. We explain the idea of RC streamlines through a toy model of freshman physics performance and then apply them to the realistic setting of the Texas State Longitudinal Data System, which contains standardized testing data for students throughout primary and secondary school since 2003. We show that the RC streamlines accurately predict trajectories, using 2 or 3 years of data. Therefore RC streamlines can be used to identify the effects of academic interventions on a time scale comparable to that of policy changes. We illustrate this assertion by examining a particular policy intervention, Texas’s Student Success Initiative.