Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study

Betzy Clariza Torres-Zegarra; Wagner Rios-Garcia; Alvaro Micael Ñaña-Cordova; Karen Fatima Arteaga-Cisneros; Xiomara Cristina Benavente Chalco; Marina Atena Bustamante Ordoñez; Carlos Jesus Gutierrez Rios; Carlos Alberto Ramos Godoy; Kristell Luisa Teresa Panta Quezada; Jesus Daniel Gutierrez-Arratia; Javier Alejandro Flores-Cohaila

doi:10.3352/jeehp.2023.20.30

Journal of Educational Evaluation for Health Professions (Nov 2023)

Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study

Betzy Clariza Torres-Zegarra,
Wagner Rios-Garcia,
Alvaro Micael Ñaña-Cordova,
Karen Fatima Arteaga-Cisneros,
Xiomara Cristina Benavente Chalco,
Marina Atena Bustamante Ordoñez,
Carlos Jesus Gutierrez Rios,
Carlos Alberto Ramos Godoy,
Kristell Luisa Teresa Panta Quezada,
Jesus Daniel Gutierrez-Arratia,
Javier Alejandro Flores-Cohaila

Affiliations

Betzy Clariza Torres-Zegarra: Escuela de Medicina, Universidad Cientifica del Sur, Lima, Peru
Wagner Rios-Garcia: Sociedad Científica de Estudiantes de Medicina de Ica, Universidad Nacional San Luis Gonzaga, Ica, Peru
Alvaro Micael Ñaña-Cordova: Escuela de Medicina, Universidad Cientifica del Sur, Lima, Peru
Karen Fatima Arteaga-Cisneros: Escuela de Medicina, Universidad Cientifica del Sur, Lima, Peru
Xiomara Cristina Benavente Chalco: Escuela de Medicina, Universidad Cientifica del Sur, Lima, Peru
Marina Atena Bustamante Ordoñez: Escuela de Medicina, Universidad Cientifica del Sur, Lima, Peru
Carlos Jesus Gutierrez Rios: Escuela de Medicina, Universidad Cientifica del Sur, Lima, Peru
Carlos Alberto Ramos Godoy: Universidad Nacional de Cajamarca, Cajamarca, Peru
Kristell Luisa Teresa Panta Quezada: Academic Department, USAMEDIC, Lima, Peru
Jesus Daniel Gutierrez-Arratia: Academic Department, USAMEDIC, Lima, Peru
Javier Alejandro Flores-Cohaila: Escuela de Medicina, Universidad Cientifica del Sur, Lima, Peru

DOI: https://doi.org/10.3352/jeehp.2023.20.30
Journal volume & issue: Vol. 20

Abstract

Read online

Purpose We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME). Methods This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing). Results GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom. Conclusion Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.

Published in Journal of Educational Evaluation for Health Professions

ISSN: 1975-5937 (Online)
Publisher: Korea Health Personnel Licensing Examination Institute
Country of publisher: Korea, Republic of
LCC subjects: Education: Special aspects of education; Medicine
Website: http://jeehp.org/

About the journal

Abstract

Keywords