ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions

Paul F. Funk; Cosima C. Hoch; Samuel Knoedler; Leonard Knoedler; Sebastian Cotofana; Giuseppe Sofo; Ali Bashiri Dezfouli; Barbara Wollenberg; Orlando Guntinas-Lichius; Michael Alfertshofer

doi:10.3390/ejihpe14030043

European Journal of Investigation in Health, Psychology and Education (Mar 2024)

ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions

Paul F. Funk,
Cosima C. Hoch,
Samuel Knoedler,
Leonard Knoedler,
Sebastian Cotofana,
Giuseppe Sofo,
Ali Bashiri Dezfouli,
Barbara Wollenberg,
Orlando Guntinas-Lichius,
Michael Alfertshofer

Affiliations

Paul F. Funk: Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Jena, Friedrich Schiller University Jena, Am Klinikum 1, 07747 Jena, Germany
Cosima C. Hoch: Department of Otolaryngology, Head and Neck Surgery, School of Medicine and Health, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, Germany
Samuel Knoedler: Department of Plastic Surgery and Hand Surgery, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, Germany
Leonard Knoedler: Division of Plastic and Reconstructive Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114, USA
Sebastian Cotofana: Department of Dermatology, Erasmus Medical Centre, Dr. Molewaterplein 40, 3015 GD Rotterdam, The Netherlands
Giuseppe Sofo: Instituto Ivo Pitanguy, Hospital Santa Casa de Misericórdia Rio de Janeiro, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro 20020-022, Brazil
Ali Bashiri Dezfouli: Department of Otolaryngology, Head and Neck Surgery, School of Medicine and Health, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, Germany
Barbara Wollenberg: Department of Otolaryngology, Head and Neck Surgery, School of Medicine and Health, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, Germany
Orlando Guntinas-Lichius: Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Jena, Friedrich Schiller University Jena, Am Klinikum 1, 07747 Jena, Germany
Michael Alfertshofer: Department of Plastic Surgery and Hand Surgery, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, Germany

DOI: https://doi.org/10.3390/ejihpe14030043
Journal volume & issue: Vol. 14, no. 3
pp. 657 – 668

Abstract

Read online

(1) Background: As the field of artificial intelligence (AI) evolves, tools like ChatGPT are increasingly integrated into various domains of medicine, including medical education and research. Given the critical nature of medicine, it is of paramount importance that AI tools offer a high degree of reliability in the information they provide. (2) Methods: A total of n = 450 medical examination questions were manually entered into ChatGPT thrice, each for ChatGPT 3.5 and ChatGPT 4. The responses were collected, and their accuracy and consistency were statistically analyzed throughout the series of entries. (3) Results: ChatGPT 4 displayed a statistically significantly improved accuracy with 85.7% compared to that of 57.7% of ChatGPT 3.5 (p p < 0.001). (4) Conclusions: The findings underscore the increased accuracy and dependability of ChatGPT 4 in the context of medical education and potential clinical decision making. Nonetheless, the research emphasizes the indispensable nature of human-delivered healthcare and the vital role of continuous assessment in leveraging AI in medicine.

Published in European Journal of Investigation in Health, Psychology and Education

ISSN: 2174-8144 (Print); 2254-9625 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Public aspects of medicine; Philosophy. Psychology. Religion: Psychology
Website: https://www.mdpi.com/journal/ejihpe

About the journal

Abstract

Keywords