Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

Takahiro Nakao; Soichiro Miki; Yuta Nakamura; Tomohiro Kikuchi; Yukihiro Nomura; Shouhei Hanaoka; Takeharu Yoshikawa; Osamu Abe

doi:10.2196/54393

JMIR Medical Education (Mar 2024)

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

Takahiro Nakao,
Soichiro Miki,
Yuta Nakamura,
Tomohiro Kikuchi,
Yukihiro Nomura,
Shouhei Hanaoka,
Takeharu Yoshikawa,
Osamu Abe

Affiliations

Takahiro Nakao: ORCiD
Soichiro Miki: ORCiD
Yuta Nakamura: ORCiD
Tomohiro Kikuchi: ORCiD
Yukihiro Nomura: ORCiD
Shouhei Hanaoka: ORCiD
Takeharu Yoshikawa: ORCiD
Osamu Abe: ORCiD

DOI: https://doi.org/10.2196/54393
Journal volume & issue: Vol. 10
p. e54393

Abstract

Read online

BackgroundPrevious research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images. ObjectiveWe aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination. MethodsWe focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions: (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test. ResultsAmong the 108 questions with images, GPT-4V’s accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (P=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; P=.21) and 30% (3/10) versus 20% (2/10; P≥.99), respectively. ConclusionsThe additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination.

Published in JMIR Medical Education

ISSN: 2369-3762 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Education: Special aspects of education; Medicine: Medicine (General)
Website: https://mededu.jmir.org

About the journal