Scientific Reports (Nov 2024)

Leveraging large language models to construct feedback from medical multiple-choice Questions

  • Mihaela Tomova,
  • Iván Roselló Atanet,
  • Victoria Sehy,
  • Miriam Sieg,
  • Maren März,
  • Patrick Mäder

DOI
https://doi.org/10.1038/s41598-024-79245-x
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Exams like the formative Progress Test Medizin can enhance their effectiveness by offering feedback beyond numerical scores. Content-based feedback, which encompasses relevant information from exam questions, can be valuable for students by offering them insight into their performance on the current exam, as well as serving as study aids and tools for revision. Our goal was to utilize Large Language Models (LLMs) in preparing content-based feedback for the Progress Test Medizin and evaluate their effectiveness in this task. We utilize two popular LLMs and conduct a comparative assessment by performing textual similarity on the generated outputs. Furthermore, we study via a survey how medical practitioners and medical educators assess the capabilities of LLMs and perceive the usage of LLMs for the task of generating content-based feedback for PTM exams. Our findings show that both examined LLMs performed similarly. Both have their own advantages and disadvantages. Our survey results indicate that one LLM produces slightly better outputs; however, this comes at a cost since it is a paid service, while the other is free to use. Overall, medical practitioners and educators who participated in the survey find the generated feedback relevant and useful, and they are open to using LLMs for such tasks in the future. We conclude that while the content-based feedback generated by the LLM may not be perfect, it nevertheless can be considered a valuable addition to the numerical feedback currently provided.

Keywords