Frontiers in Education (Oct 2024)

LLM-generated competence-based e-assessment items for higher education mathematics: methodology and evaluation

  • Roy Meissner,
  • Alexander Pögelt,
  • Katja Ihsberner,
  • Martin Grüttmüller,
  • Silvana Tornack,
  • Andreas Thor,
  • Norbert Pengel,
  • Heinz-Werner Wollersheim,
  • Wolfram Hardt

DOI
https://doi.org/10.3389/feduc.2024.1427502
Journal volume & issue
Vol. 9

Abstract

Read online

In this article, we explore the transformative impact of advanced, parameter-rich Large Language Models (LLMs) on the production of instructional materials in higher education, with a focus on the automated generation of both formative and summative assessments for learners in the field of mathematics. We introduce a novel LLM-driven process and application, called ItemForge, tailored specifically for the automatic generation of e-assessment items in mathematics. The approach is thoroughly aligned with the levels and hierarchy of cognitive learning objectives as developed by Anderson and Krathwohl, and takes specific mathematical concepts from the considered courses into consideration. The quality of the generated free-text items, along with their corresponding answers (sample solutions), as well as their appropriateness to the designated cognitive level and subject matter, were evaluated in a small-scale study. In this study, three mathematical experts reviewed a total of 240 generated items, providing a comprehensive analysis of their effectiveness and relevance. Our findings demonstrate that the tool is proficient in producing high-quality items that align with the chosen concepts and targeted cognitive levels, indicating its potential suitability for educational purposes. However, it was observed that the provided answers (sample solutions) occasionally exhibited inaccuracies or were not entirely complete, signalling a necessity for additional refinement of the tool's processes.

Keywords