Assessing the possibility of using large language models in ocular surface diseases

Qian Ling; Zi-Song Xu; Yan-Mei Zeng; Qi Hong; Xian-Zhe Qian; Jin-Yu Hu; Chong-Gang Pei; Hong Wei; Jie Zou; Cheng Chen; Xiao-Yu Wang; Xu Chen; Zhen-Kai Wu; Yi Shao

doi:10.18240/ijo.2025.01.01

International Journal of Ophthalmology (Jan 2025)

Assessing the possibility of using large language models in ocular surface diseases

Qian Ling,
Zi-Song Xu,
Yan-Mei Zeng,
Qi Hong,
Xian-Zhe Qian,
Jin-Yu Hu,
Chong-Gang Pei,
Hong Wei,
Jie Zou,
Cheng Chen,
Xiao-Yu Wang,
Xu Chen,
Zhen-Kai Wu,
Yi Shao

Affiliations

Qian Ling: Yi Shao. Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai 200080, China. [email protected]; Zhen-Kai Wu. Changde Hospital, Xiangya School of Medicine, Central South University (the First People's Hospital of Changde City), Changde 415000, Hunan Province, China. [email protected]
Zi-Song Xu: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Yan-Mei Zeng: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Qi Hong: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Xian-Zhe Qian: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Jin-Yu Hu: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Chong-Gang Pei: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Hong Wei: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Jie Zou: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Cheng Chen: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Xiao-Yu Wang: Department of Ophthalmology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China
Xu Chen: Ophthalmology Centre of Maastricht University, Maastricht 6200MS, Limburg, Netherlands
Zhen-Kai Wu: Changde Hospital, Xiangya School of Medicine, Central South University (the First People's Hospital of Changde City), Changde 415000, Hunan Province, China
Yi Shao: Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai 200080, China

DOI: https://doi.org/10.18240/ijo.2025.01.01
Journal volume & issue: Vol. 18, no. 1
pp. 1 – 8

Abstract

Read online

AIM: To assess the possibility of using different large language models (LLMs) in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases: ChatGPT-4, ChatGPT-3.5, Claude 2, PaLM2, and SenseNova. METHODS: A group of experienced ophthalmology professors were asked to develop a 100-question single-choice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions. The exam includes questions on the following topics: keratitis disease (20 questions), keratoconus, keratomalaciac, corneal dystrophy, corneal degeneration, erosive corneal ulcers, and corneal lesions associated with systemic diseases (20 questions), conjunctivitis disease (20 questions), trachoma, pterygoid and conjunctival tumor diseases (20 questions), and dry eye disease (20 questions). Then the total score of each LLMs and compared their mean score, mean correlation, variance, and confidence were calculated. RESULTS: GPT-4 exhibited the highest performance in terms of LLMs. Comparing the average scores of the LLMs group with the four human groups, chief physician, attending physician, regular trainee, and graduate student, it was found that except for ChatGPT-4, the total score of the rest of the LLMs is lower than that of the graduate student group, which had the lowest score in the human group. Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers, giving very little chance of an incorrect answer. ChatGPT-4 showed higher credibility when answering questions, with a success rate of 59%, but gave the wrong answer to the question 28% of the time. CONCLUSION: GPT-4 model exhibits excellent performance in both answer relevance and confidence. PaLM2 shows a positive correlation (up to 0.8) in terms of answer accuracy during the exam. In terms of answer confidence, PaLM2 is second only to GPT4 and surpasses Claude 2, SenseNova, and GPT-3.5. Despite the fact that ocular surface disease is a highly specialized discipline, GPT-4 still exhibits superior performance, suggesting that its potential and ability to be applied in this field is enormous, perhaps with the potential to be a valuable resource for medical students and clinicians in the future.

Published in International Journal of Ophthalmology

ISSN: 2222-3959 (Print); 2227-4898 (Online)
Publisher: Press of International Journal of Ophthalmology (IJO PRESS)
Country of publisher: China
LCC subjects: Medicine: Ophthalmology
Website: http://www.ijo.cn/gjyken/ch/index.aspx

About the journal

Abstract

Keywords