Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation

Xiu Li; Aron Henriksson; Martin Duneld; Jalal Nouri; Yongchao Wu

doi:10.3390/fi16010012

Future Internet (Dec 2023)

Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation

Xiu Li,
Aron Henriksson,
Martin Duneld,
Jalal Nouri,
Yongchao Wu

Affiliations

Xiu Li: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden
Aron Henriksson: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden
Martin Duneld: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden
Jalal Nouri: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden
Yongchao Wu: Department of Computer and Systems Sciences, Stockholm University, NOD-Huset, Borgarfjordsgatan 12, 16455 Stockholm, Sweden

DOI: https://doi.org/10.3390/fi16010012
Journal volume & issue: Vol. 16, no. 1
p. 12

Abstract

Read online

Educational content recommendation is a cornerstone of AI-enhanced learning. In particular, to facilitate navigating the diverse learning resources available on learning platforms, methods are needed for automatically linking learning materials, e.g., in order to recommend textbook content based on exercises. Such methods are typically based on semantic textual similarity (STS) and the use of embeddings for text representation. However, it remains unclear what types of embeddings should be used for this task. In this study, we carry out an extensive empirical evaluation of embeddings derived from three different types of models: (i) static embeddings trained using a concept-based knowledge graph, (ii) contextual embeddings from a pre-trained language model, and (iii) contextual embeddings from a large language model (LLM). In addition to evaluating the models individually, various ensembles are explored based on different strategies for combining two models in an early vs. late fusion fashion. The evaluation is carried out using digital textbooks in Swedish for three different subjects and two types of exercises. The results show that using contextual embeddings from an LLM leads to superior performance compared to the other models, and that there is no significant improvement when combining these with static embeddings trained using a knowledge graph. When using embeddings derived from a smaller language model, however, it helps to combine them with knowledge graph embeddings. The performance of the best-performing model is high for both types of exercises, resulting in a mean Recall@3 of 0.96 and 0.95 and a mean MRR of 0.87 and 0.86 for quizzes and study questions, respectively, demonstrating the feasibility of using STS based on text embeddings for educational content recommendation. The ability to link digital learning materials in an unsupervised manner—relying only on readily available pre-trained models—facilitates the development of AI-enhanced learning.

Published in Future Internet

ISSN: 1999-5903 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/futureinternet/

About the journal

Abstract

Keywords