Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery

Maciej Laskowski; Marcin Ciekalski; Marcin Laskowski; Bartłomiej Błaszczyk; Marcin Setlak; Piotr Paździora; Adam Rudnik

doi:10.18794/aams/186827

Annales Academiae Medicae Silesiensis (Oct 2024)

Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery

Maciej Laskowski,
Marcin Ciekalski,
Marcin Laskowski,
Bartłomiej Błaszczyk,
Marcin Setlak,
Piotr Paździora,
Adam Rudnik

Affiliations

Maciej Laskowski: ORCiD; Students’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
Marcin Ciekalski: ORCiD; Students’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
Marcin Laskowski: Unhyped, AI Growth Partner, Kraków, Poland
Bartłomiej Błaszczyk: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
Marcin Setlak: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
Piotr Paździora: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland
Adam Rudnik: Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, Poland

DOI: https://doi.org/10.18794/aams/186827
Journal volume & issue: Vol. 78
pp. 253 – 258

Abstract

Read online

Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater dependence on AI-assisted tools for diagnostics, image analysis, and decision-making. Material and methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on a neurosurgery exam from Autumn 2017, which was the latest exam with officially provided answers on the Medical Examinations Center in Łódź, Poland (Centrum Egzaminów Medycznych – CEM) website. The passing score for the National Specialization Exam (Państwowy Egzamin Specjalizacyjny – PES) in Poland, as administered by CEM, is 56% of the valid questions. This exam, chosen from CEM, comprised 116 single-choice questions after eliminating four outdated questions. These questions were categorized into ten thematic groups based on the subjects they address. For data collection, both ChatGPT versions were briefed on the exam rules and asked to rate their confidence in each answer on a scale from 1 (definitely not sure) to 5 (definitely sure). All the interactions were conducted in Polish and were recorded. Results: ChatGPT-4 significantly outperformed ChatGPT-3.5, showing a notable improvement with a 29.4% margin (p < 0.001). Unlike ChatGPT-3.5, ChatGPT-4 successfully reached the passing threshold for the PES. ChatGPT-3.5 and ChatGPT-4 had the same answers in 61 questions (52.58%), both were correct in 28 questions (24.14%), and were incorrect in 33 questions (28.45%). Conclusions: ChatGPT-4 shows improved accuracy over ChatGPT-3.5, likely due to advanced algorithms and a broader training dataset, highlighting its better grasp of complex neurosurgical concepts.

Published in Annales Academiae Medicae Silesiensis

ISSN: 1734-025X (Online)
Publisher: Śląski Uniwersytet Medyczny w Katowicach
Country of publisher: Poland
LCC subjects: Medicine: Pharmacy and materia medica; Medicine: Dentistry
Website: https://annales.sum.edu.pl/

About the journal

Abstract

Keywords