ChatGPT – a tool for assisted studying or a source of misleading medical information? AI performance on Polish Medical Final Examination

Karol Żmudka; Aleksandra Spychał; Błażej Ochman; Łukasz Popowicz; Patrycja Piłat; Jerzy Jaroszewicz

doi:10.18794/aams/176450

Annales Academiae Medicae Silesiensis (Apr 2024)

ChatGPT – a tool for assisted studying or a source of misleading medical information? AI performance on Polish Medical Final Examination

Karol Żmudka,
Aleksandra Spychał,
Błażej Ochman,
Łukasz Popowicz,
Patrycja Piłat,
Jerzy Jaroszewicz

Affiliations

Karol Żmudka: ORCiD; Department of Infectious Diseases and Hepatology, Faculty of Medical Sciences in Zabrze, Medical University of Silesia, Katowice, Poland
Aleksandra Spychał: ORCiD; Department of Infectious Diseases and Hepatology, Faculty of Medical Sciences in Zabrze, Medical University of Silesia, Katowice, Poland
Błażej Ochman: ORCiD; Department of Medical and Molecular Biology, Faculty of Medical Sciences in Zabrze, Medical University of Silesia, Katowice, Poland
Łukasz Popowicz: ORCiD; Department of Psychiatry, Faculty of Medical Sciences in Zabrze, Medical University of Silesia, Katowice, Poland
Patrycja Piłat: Department of Psychiatry, Faculty of Medical Sciences in Zabrze, Medical University of Silesia, Katowice, Poland
Jerzy Jaroszewicz: ORCiD; Department of Infectious Diseases and Hepatology, Faculty of Medical Sciences in Zabrze, Medical University of Silesia, Katowice, Poland

DOI: https://doi.org/10.18794/aams/176450
Journal volume & issue: Vol. 78
pp. 94 – 103

Abstract

Read online

Introduction: ChatGPT is a language model created by OpenAI that can engage in human-like conversations and generate text based on the input it receives. The aim of the study was to assess the overall performance of ChatGPT on the Polish Medical Final Examination (Lekarski Egzamin Końcowy – LEK) the factors influencing the percentage of correct answers. Secondly, investigate the capabilities of chatbot to provide explanations was examined. Material and methods: We entered 591 questions with distractors from the LEK database into ChatGPT (version 13th February – 14th March). We compared the results with the answer key and analyzed the provided explanation for logical justification. For the correct answers we analyzed the logical consistency of the explanation, while for the incorrect answers, the ability to provide a correction was observed. Selected factors were analyzed for an influence on the chatbot’s performance. Results: ChatGPT achieved impressive scores of 58.16%, 60.91% and 67.86% allowing it pass the official threshold of 56% in all instances. For the properly answered questions, more than 70% were backed by a logically coherent explanation. In the case of the wrongly answered questions the chatbot provided a seemingly correct explanation for false information in 66% of the cases. Factors such as logical construction (p < 0.05) and difficulty (p < 0.05) had an influence on the overall score, meanwhile the length (p = 0.46) and language (p = 0.14) did not. Conclusions: Although achieving a sufficient score to pass LEK, ChatGPT in many cases provides misleading information backed by a seemingly compelling explanation. The chatbot can be especially misleading for non-medical users as compared to a web search because it can provide instant compelling explanations. Thus, if used improperly, it could pose a danger to public health. This makes it a problematic recommendation for assisted studying.

Published in Annales Academiae Medicae Silesiensis

ISSN: 1734-025X (Online)
Publisher: Śląski Uniwersytet Medyczny w Katowicach
Country of publisher: Poland
LCC subjects: Medicine: Pharmacy and materia medica; Medicine: Dentistry
Website: https://annales.sum.edu.pl/

About the journal

Abstract

Keywords