Taiwan Journal of Ophthalmology (Sep 2024)
Investigating the comparative superiority of artificial intelligence programs in assessing knowledge levels regarding ocular inflammation, uvea diseases, and treatment modalities
Abstract
PURPOSE: The purpose of the study was to evaluate the knowledge level of the Chat Generative Pretrained Transformer (ChatGPT), Bard, and Bing artificial intelligence (AI) chatbots regarding ocular inflammation, uveal diseases, and treatment modalities, and to investigate their relative performance compared to one another. MATERIALS AND METHODS: Thirty-six questions related to ocular inflammation, uveal diseases, and treatment modalities were posed to the ChatGPT, Bard, and Bing AI chatbots, and both correct and incorrect responses were recorded. The accuracy rates were compared using the Chi-squared test. Results: The ChatGPT provided correct answers to 52.8% of the questions, while Bard answered 38.9% correctly, and Bing answered 44.4% correctly. All three AI programs provided identical responses to 20 (55.6%) of the questions, with 45% of these responses being correct and 55% incorrect. No significant difference was observed between the correct and incorrect responses from the three AI chatbots (P = 0.654). Conclusion: AI chatbots should be developed to provide widespread access to accurate information about ocular inflammation, uveal diseases, and treatment modalities. Future research could explore ways to enhance the performance of these chatbots.
Keywords