Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study

Wenxiong Liao; Zhengliang Liu; Haixing Dai; Shaochen Xu; Zihao Wu; Yiyang Zhang; Xiaoke Huang; Dajiang Zhu; Hongmin Cai; Quanzheng Li; Tianming Liu; Xiang Li

doi:10.2196/48904

JMIR Medical Education (Dec 2023)

Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study

Wenxiong Liao,
Zhengliang Liu,
Haixing Dai,
Shaochen Xu,
Zihao Wu,
Yiyang Zhang,
Xiaoke Huang,
Dajiang Zhu,
Hongmin Cai,
Quanzheng Li,
Tianming Liu,
Xiang Li

Affiliations

Wenxiong Liao: ORCiD
Zhengliang Liu: ORCiD
Haixing Dai: ORCiD
Shaochen Xu: ORCiD
Zihao Wu: ORCiD
Yiyang Zhang: ORCiD
Xiaoke Huang: ORCiD
Dajiang Zhu: ORCiD
Hongmin Cai: ORCiD
Quanzheng Li: ORCiD
Tianming Liu: ORCiD
Xiang Li: ORCiD

DOI: https://doi.org/10.2196/48904
Journal volume & issue: Vol. 9
p. e48904

Abstract

Read online

BackgroundLarge language models, such as ChatGPT, are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the internet. However, medical texts, such as clinical notes and diagnoses, require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to health care and the general public. ObjectiveThis study is among the first on responsible artificial intelligence–generated content in medicine. We focus on analyzing the differences between medical texts written by human experts and those generated by ChatGPT and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT. MethodsWe first constructed a suite of data sets containing medical texts written by human experts and generated by ChatGPT. We analyzed the linguistic features of these 2 types of content and uncovered differences in vocabulary, parts-of-speech, dependency, sentiment, perplexity, and other aspects. Finally, we designed and implemented machine learning methods to detect medical text generated by ChatGPT. The data and code used in this paper are published on GitHub. ResultsMedical texts written by humans were more concrete, more diverse, and typically contained more useful information, while medical texts generated by ChatGPT paid more attention to fluency and logic and usually expressed general terminologies rather than effective information specific to the context of the problem. A bidirectional encoder representations from transformers–based model effectively detected medical texts generated by ChatGPT, and the F1 score exceeded 95%. ConclusionsAlthough text generated by ChatGPT is grammatically perfect and human-like, the linguistic characteristics of generated medical texts were different from those written by human experts. Medical text generated by ChatGPT could be effectively detected by the proposed machine learning algorithms. This study provides a pathway toward trustworthy and accountable use of large language models in medicine.

Published in JMIR Medical Education

ISSN: 2369-3762 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Education: Special aspects of education; Medicine: Medicine (General)
Website: https://mededu.jmir.org

About the journal