Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology

Yixing Huang; Yixing Huang; Ahmed Gomaa; Ahmed Gomaa; Sabine Semrau; Sabine Semrau; Marlen Haderlein; Marlen Haderlein; Sebastian Lettmaier; Sebastian Lettmaier; Thomas Weissmann; Thomas Weissmann; Johanna Grigo; Johanna Grigo; Hassen Ben Tkhayat; Hassen Ben Tkhayat; Benjamin Frey; Benjamin Frey; Udo Gaipl; Udo Gaipl; Luitpold Distel; Luitpold Distel; Andreas Maier; Rainer Fietkau; Rainer Fietkau; Christoph Bert; Christoph Bert; Florian Putz; Florian Putz

doi:10.3389/fonc.2023.1265024

Frontiers in Oncology (Sep 2023)

Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology

Yixing Huang,
Yixing Huang,
Ahmed Gomaa,
Ahmed Gomaa,
Sabine Semrau,
Sabine Semrau,
Marlen Haderlein,
Marlen Haderlein,
Sebastian Lettmaier,
Sebastian Lettmaier,
Thomas Weissmann,
Thomas Weissmann,
Johanna Grigo,
Johanna Grigo,
Hassen Ben Tkhayat,
Hassen Ben Tkhayat,
Benjamin Frey,
Benjamin Frey,
Udo Gaipl,
Udo Gaipl,
Luitpold Distel,
Luitpold Distel,
Andreas Maier,
Rainer Fietkau,
Rainer Fietkau,
Christoph Bert,
Christoph Bert,
Florian Putz,
Florian Putz

Affiliations

Yixing Huang: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Yixing Huang: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Ahmed Gomaa: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Ahmed Gomaa: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Sabine Semrau: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Sabine Semrau: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Marlen Haderlein: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Marlen Haderlein: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Sebastian Lettmaier: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Sebastian Lettmaier: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Thomas Weissmann: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Thomas Weissmann: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Johanna Grigo: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Johanna Grigo: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Hassen Ben Tkhayat: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Hassen Ben Tkhayat: Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Benjamin Frey: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Benjamin Frey: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Udo Gaipl: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Udo Gaipl: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Luitpold Distel: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Luitpold Distel: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Andreas Maier: Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Rainer Fietkau: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Rainer Fietkau: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Christoph Bert: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Christoph Bert: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Florian Putz: Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Florian Putz: Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany

DOI: https://doi.org/10.3389/fonc.2023.1265024
Journal volume & issue: Vol. 13

Abstract

Read online

PurposeThe potential of large language models in medicine for education and decision-making purposes has been demonstrated as they have achieved decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. This work aims to evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology.MethodsThe 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases are used to benchmark the performance of ChatGPT-4. The TXIT exam contains 300 questions covering various topics of radiation oncology. The 2022 Gray Zone collection contains 15 complex clinical cases.ResultsFor the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 62.05% and 78.77%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4’s strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts.ConclusionBoth evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Owing to the risk of hallucinations, it is essential to verify the content generated by models such as ChatGPT for accuracy.

Published in Frontiers in Oncology

ISSN: 2234-943X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://www.frontiersin.org/journals/oncology/

About the journal

Abstract

Keywords