Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study

Zelin Wu; Wenyi Gan; Zhaowen Xue; Zhengxin Ni; Xiaofei Zheng; Yiyi Zhang

doi:10.2196/52746

JMIR Medical Education (Oct 2024)

Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study

Zelin Wu,
Wenyi Gan,
Zhaowen Xue,
Zhengxin Ni,
Xiaofei Zheng,
Yiyi Zhang

Affiliations

Zelin Wu: ORCiD
Wenyi Gan: ORCiD
Zhaowen Xue: ORCiD
Zhengxin Ni: ORCiD
Xiaofei Zheng: ORCiD
Yiyi Zhang: ORCiD

DOI: https://doi.org/10.2196/52746
Journal volume & issue: Vol. 10
pp. e52746 – e52746

Abstract

Read online

Abstract BackgroundThe creation of large language models (LLMs) such as ChatGPT is an important step in the development of artificial intelligence, which shows great potential in medical education due to its powerful language understanding and generative capabilities. The purpose of this study was to quantitatively evaluate and comprehensively analyze ChatGPT’s performance in handling questions for the National Nursing Licensure Examination (NNLE) in China and the United States, including the National Council Licensure Examination for Registered Nurses (NCLEX-RN) and the NNLE. ObjectiveThis study aims to examine how well LLMs respond to the NCLEX-RN and the NNLE multiple-choice questions (MCQs) in various language inputs. To evaluate whether LLMs can be used as multilingual learning assistance for nursing, and to assess whether they possess a repository of professional knowledge applicable to clinical nursing practice. MethodsFirst, we compiled 150 NCLEX-RN Practical MCQs, 240 NNLE Theoretical MCQs, and 240 NNLE Practical MCQs. Then, the translation function of ChatGPT 3.5 was used to translate NCLEX-RN questions from English to Chinese and NNLE questions from Chinese to English. Finally, the original version and the translated version of the MCQs were inputted into ChatGPT 4.0, ChatGPT 3.5, and Google Bard. Different LLMs were compared according to the accuracy rate, and the differences between different language inputs were compared. ResultsThe accuracy rates of ChatGPT 4.0 for NCLEX-RN practical questions and Chinese-translated NCLEX-RN practical questions were 88.7% (133/150) and 79.3% (119/150), respectively. Despite the statistical significance of the difference (PPPPPPPPPPP ConclusionsThis study, focusing on 618 nursing MCQs including NCLEX-RN and NNLE exams, found that ChatGPT 4.0 outperformed ChatGPT 3.5 and Google Bard in accuracy. It excelled in processing English and Chinese inputs, underscoring its potential as a valuable tool in nursing education and clinical decision-making.

Published in JMIR Medical Education

ISSN: 2369-3762 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Education: Special aspects of education; Medicine: Medicine (General)
Website: https://mededu.jmir.org

About the journal