AERA Open (Oct 2022)

Forecasting Undergraduate Majors: A Natural Language Approach

  • David Lang,
  • Alex Wang,
  • Nathan Dalal,
  • Andreas Paepcke,
  • Mitchell L. Stevens

DOI
https://doi.org/10.1177/23328584221126516
Journal volume & issue
Vol. 8

Abstract

Read online

Committing to a major is a fateful step in an undergraduate education, yet the relationship between courses taken early in an academic career and ultimate major issuance remains little studied at scale. Using transcript data capturing the academic careers of 26,892 undergraduates enrolled at a private university between 2000 and 2020, we describe enrollment histories by using natural-language methods and vector embeddings to forecast terminal major on the basis of course sequences beginning at college entry. We find that (a) a student’s very first enrolled course predicts their major 30 times better than random guessing and more than one-third better than majority-class voting, (b) modeling strategies substantially influence forecasting metrics, and (c) course portfolios vary substantially within majors, such that students with the same major exhibit relatively modest overlap.