Scientific Data (May 2024)
ACcoding: A graph-based dataset for online judge programming
Abstract
Abstract A well-designed educational programming dataset is a valuable asset for students and educators. Such a dataset enables students to improve their programming performances continuously, provides researchers with significant data sources to identify students’ learning behaviours and enhance the quality of programming education. Several existing datasets for programming education are either limited by a small number of participating students or a short span of learning records, bringing great challenges to investigate students’ learning patterns in programming. We present a graph-based large-scale dataset specialized in programming learning on Online Judge (OJ) platform. The dataset, named ACcoding, was built by a university teaching group. As of the submission date of the initial manuscript of this paper (May 6, 2022), the dataset contains 4,046,652 task-solving records submitted by 27,444 students on 4,559 programming tasks over a span of 6 years. The large size of the dataset, combined with rich functional features, empowers educators to trace students’ programming progress and choose appropriate programming tasks for specific training purposes. We also presents examples of applications used by the dataset.