IEEE Access (Jan 2024)

Click-Based Representation Learning Framework of Student Navigational Behavior in MOOCs

  • Shrooq Al Amoudi,
  • Areej Alhothali,
  • Rsha Mirza,
  • Hussein Assalahi,
  • Tahani Aldosemani

DOI
https://doi.org/10.1109/ACCESS.2024.3450514
Journal volume & issue
Vol. 12
pp. 121480 – 121494

Abstract

Read online

Predictive learning outcomes’ models for online students can provide useful information to instructors to estimate students’ final performance in the early stages of a course. Anticipating student performance can improve learning efficiency. Existing research models that analysed student data have focused on handcrafted features, but these models have limitations in exploring new behavioral patterns that indicate student performance and how they can be used in online courses. The clickstream data contains a significant amount of information that accurately describes students’ learning processes, which makes it difficult to construct using hand-crafted features. To analyze student behavior effectively, we attempted to capture critical knowledge from the field of natural language processing (NLP) to the field of student performance prediction in Massive Open Online Courses (MOOCs), owing to how closely they resemble each other. In this article, we propose a novel framework for automatically producing useful data representation that enhances prediction outcomes using student learning behavior clickstream data with a self-supervised learning approach. First, we developed a self-supervised clickstream pre-training setup to model learner click generation. Second, we adjusted these latent representations before applying them to a downstream supervised learning task. Extensive experimental results on two real-world datasets demonstrated that the proposed approach is effective. The combined approach of skip-gram embeddings with Principal Component Analysis (PCA) achieved the highest accuracy, particularly on the Xutangx dataset, with an accuracy of approximately 72.70% and an F1-score of approximately 81.03%. Furthermore, when applied to the KDDCUP dataset, this methodology exhibited even higher performance, with an accuracy of 80.91% and an F1-score of 87.42%. Our results showed the potential of NLP techniques to improve dropout prediction in MOOCs by extracting informative representations from clickstream data, allowing a deeper understanding of student behavior, and facilitating early intervention strategies.

Keywords