Performance evaluation of ChatGPT in detecting diagnostic errors and their contributing factors: an analysis of 545 case reports of diagnostic errors

Yukinori Harada; Taku Harada; Tetsu Sakamoto; Taro Shimizu; Kotaro Kunitomo; Hiroyuki Nagano; Takashi Watari; Kosuke Ishizuka; Tomoharu Suzuki; Taiju Miyagami; Ren Kawamura

doi:10.1136/bmjoq-2023-002654

BMJ Open Quality (Apr 2024)

Performance evaluation of ChatGPT in detecting diagnostic errors and their contributing factors: an analysis of 545 case reports of diagnostic errors

Yukinori Harada,
Taku Harada,
Tetsu Sakamoto,
Taro Shimizu,
Kotaro Kunitomo,
Hiroyuki Nagano,
Takashi Watari,
Kosuke Ishizuka,
Tomoharu Suzuki,
Taiju Miyagami,
Ren Kawamura

Affiliations

Yukinori Harada: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
Taku Harada: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
Tetsu Sakamoto: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
Taro Shimizu: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
Kotaro Kunitomo: NHO Kumamoto Medical Center, Kumamoto, Kumamoto, Japan
Hiroyuki Nagano: Department of General Internal Medicine, Tenri Hospital, Tenri, Nara, Japan
Takashi Watari: Integrated Clinical Education Center, Kyoto University Hospital, Kyoto, Kyoto, Japan
Kosuke Ishizuka: Yokohama City University School of Medicine Graduate School of Medicine, Yokohama, Kanagawa, Japan
Tomoharu Suzuki: Urasoe General Hospital, Urasoe, Okinawa, Japan
Taiju Miyagami: Department of General Medicine, Faculty of Medicine, Juntendo University, Bunkyo-ku, Tokyo, Japan
Ren Kawamura: Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan

DOI: https://doi.org/10.1136/bmjoq-2023-002654
Journal volume & issue: Vol. 13, no. 2

Abstract

Read online

Background Manual chart review using validated assessment tools is a standardised methodology for detecting diagnostic errors. However, this requires considerable human resources and time. ChatGPT, a recently developed artificial intelligence chatbot based on a large language model, can effectively classify text based on suitable prompts. Therefore, ChatGPT can assist manual chart reviews in detecting diagnostic errors.Objective This study aimed to clarify whether ChatGPT could correctly detect diagnostic errors and possible factors contributing to them based on case presentations.Methods We analysed 545 published case reports that included diagnostic errors. We imputed the texts of case presentations and the final diagnoses with some original prompts into ChatGPT (GPT-4) to generate responses, including the judgement of diagnostic errors and contributing factors of diagnostic errors. Factors contributing to diagnostic errors were coded according to the following three taxonomies: Diagnosis Error Evaluation and Research (DEER), Reliable Diagnosis Challenges (RDC) and Generic Diagnostic Pitfalls (GDP). The responses on the contributing factors from ChatGPT were compared with those from physicians.Results ChatGPT correctly detected diagnostic errors in 519/545 cases (95%) and coded statistically larger numbers of factors contributing to diagnostic errors per case than physicians: DEER (median 5 vs 1, p<0.001), RDC (median 4 vs 2, p<0.001) and GDP (median 4 vs 1, p<0.001). The most important contributing factors of diagnostic errors coded by ChatGPT were ‘failure/delay in considering the diagnosis’ (315, 57.8%) in DEER, ‘atypical presentation’ (365, 67.0%) in RDC, and ‘atypical presentation’ (264, 48.4%) in GDP.Conclusion ChatGPT accurately detects diagnostic errors from case presentations. ChatGPT may be more sensitive than manual reviewing in detecting factors contributing to diagnostic errors, especially for ‘atypical presentation’.

Published in BMJ Open Quality

ISSN: 2399-6641 (Online)
Publisher: BMJ Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General)
Website: https://bmjopenquality.bmj.com

About the journal