Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study

Hiromizu Takahashi; Kiyoshi Shikino; Takeshi Kondo; Akira Komori; Yuji Yamada; Mizue Saita; Toshio Naito

doi:10.2196/59133

JMIR Medical Education (Aug 2024)

Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study

Hiromizu Takahashi,
Kiyoshi Shikino,
Takeshi Kondo,
Akira Komori,
Yuji Yamada,
Mizue Saita,
Toshio Naito

Affiliations

Hiromizu Takahashi: ORCiD
Kiyoshi Shikino: ORCiD
Takeshi Kondo: ORCiD
Akira Komori: ORCiD
Yuji Yamada: ORCiD
Mizue Saita: ORCiD
Toshio Naito: ORCiD

DOI: https://doi.org/10.2196/59133
Journal volume & issue: Vol. 10
p. e59133

Abstract

Read online

BackgroundEvaluating the accuracy and educational utility of artificial intelligence–generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored. ObjectiveThis study aimed to assess the educational utility of ChatGPT-4–generated clinical vignettes and their applicability in educational settings. MethodsUsing a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians’ experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases. ResultsOf the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P<.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations. ConclusionsChatGPT-4–generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4’s value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.

Published in JMIR Medical Education

ISSN: 2369-3762 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Education: Special aspects of education; Medicine: Medicine (General)
Website: https://mededu.jmir.org

About the journal