Leveraging large language models to construct feedback from medical multiple-choice Questions

Mihaela Tomova; Iván Roselló Atanet; Victoria Sehy; Miriam Sieg; Maren März; Patrick Mäder

doi:10.1038/s41598-024-79245-x

Scientific Reports (Nov 2024)

Leveraging large language models to construct feedback from medical multiple-choice Questions

Mihaela Tomova,
Iván Roselló Atanet,
Victoria Sehy,
Miriam Sieg,
Maren März,
Patrick Mäder

Affiliations

Mihaela Tomova: Data-Intensive Systems and Visualization Group (dAI.SY), Fakultät für Informatik und Automatisierung, Technische Universität Ilmenau
Iván Roselló Atanet: AG Progress Test Medizin, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt Universität zu Berlin
Victoria Sehy: AG Progress Test Medizin, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt Universität zu Berlin
Miriam Sieg: AG Progress Test Medizin, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt Universität zu Berlin
Maren März: AG Progress Test Medizin, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt Universität zu Berlin
Patrick Mäder: Data-Intensive Systems and Visualization Group (dAI.SY), Fakultät für Informatik und Automatisierung, Technische Universität Ilmenau

DOI: https://doi.org/10.1038/s41598-024-79245-x
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Exams like the formative Progress Test Medizin can enhance their effectiveness by offering feedback beyond numerical scores. Content-based feedback, which encompasses relevant information from exam questions, can be valuable for students by offering them insight into their performance on the current exam, as well as serving as study aids and tools for revision. Our goal was to utilize Large Language Models (LLMs) in preparing content-based feedback for the Progress Test Medizin and evaluate their effectiveness in this task. We utilize two popular LLMs and conduct a comparative assessment by performing textual similarity on the generated outputs. Furthermore, we study via a survey how medical practitioners and medical educators assess the capabilities of LLMs and perceive the usage of LLMs for the task of generating content-based feedback for PTM exams. Our findings show that both examined LLMs performed similarly. Both have their own advantages and disadvantages. Our survey results indicate that one LLM produces slightly better outputs; however, this comes at a cost since it is a paid service, while the other is free to use. Overall, medical practitioners and educators who participated in the survey find the generated feedback relevant and useful, and they are open to using LLMs for such tasks in the future. We conclude that while the content-based feedback generated by the LLM may not be perfect, it nevertheless can be considered a valuable addition to the numerical feedback currently provided.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords