Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries

Krithi Pushpanathan; Zhi Wei Lim; Samantha Min Er Yew; David Ziyou Chen; Hazel Anne Hui'En Lin; Jocelyn Hui Lin Goh; Wendy Meihua Wong; Xiaofei Wang; Marcus Chun Jin Tan; Victor Teck Chang Koh; Yih-Chung Tham

iScience (Nov 2023)

Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries

Krithi Pushpanathan,
Zhi Wei Lim,
Samantha Min Er Yew,
David Ziyou Chen,
Hazel Anne Hui'En Lin,
Jocelyn Hui Lin Goh,
Wendy Meihua Wong,
Xiaofei Wang,
Marcus Chun Jin Tan,
Victor Teck Chang Koh,
Yih-Chung Tham

Affiliations

Krithi Pushpanathan: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Zhi Wei Lim: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Samantha Min Er Yew: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
David Ziyou Chen: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Department of Ophthalmology, National University Hospital, Singapore, Singapore
Hazel Anne Hui'En Lin: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Department of Ophthalmology, National University Hospital, Singapore, Singapore
Jocelyn Hui Lin Goh: Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
Wendy Meihua Wong: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Department of Ophthalmology, National University Hospital, Singapore, Singapore
Xiaofei Wang: Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing, China; Advanced Innovation Centre for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
Marcus Chun Jin Tan: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Department of Ophthalmology, National University Hospital, Singapore, Singapore
Victor Teck Chang Koh: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Department of Ophthalmology, National University Hospital, Singapore, Singapore
Yih-Chung Tham: Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore; Ophthalmology and Visual Sciences Academic Clinical Programme (Eye ACP), Duke NUS Medical School, Singapore, Singapore; Corresponding author

Journal volume & issue: Vol. 26, no. 11
p. 108163

Abstract

Read online

Summary: In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.

Published in iScience

ISSN: 2589-0042 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science
Website: http://www.cell.com/iscience/home

About the journal

Abstract

Keywords