Scientific Reports (Dec 2024)

A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis

  • Junxiu Zhang,
  • Yao Ma,
  • Rong Zhang,
  • Yanhua Chen,
  • Mengyao Xu,
  • Su Rina,
  • Ke Ma

DOI
https://doi.org/10.1038/s41598-024-80917-x
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 7

Abstract

Read online

Abstract Artificial intelligence (AI), particularly large language models like GPT-4o, holds promise for enhancing diagnostic accuracy in healthcare. This study evaluates the diagnostic performance of GPT-4o compared to human ophthalmologists in glaucoma cases. A prospective, observational study was conducted at a tertiary care ophthalmology center. Twenty-six glaucoma cases, including both primary and secondary types, were selected from publicly available databases and institutional records. The cases were analyzed by GPT-4o and three ophthalmologists with varying levels of experience. The accuracy and completeness of primary and differential diagnoses were assessed using 10-point and 6-point Likert scales, respectively. Statistical analyses were performed using nonparametric methods, including the Kruskal–Wallis and Mann–Whitney U tests. GPT-4o was significantly less accurate in primary diagnosis compared to human ophthalmologists. Specifically, GPT-4o achieved a mean score of 5.500 (p < 0.001) compared to Doctor C, who had the highest score of 8.038 (p < 0.001). Completeness scores for GPT-4o 3.077 (p < 0.001) were also lower than Doctor B, who had the lowest score of 3.615 (p < 0.001) among human ophthalmologists. However, for differential diagnosis, GPT-4o (7.577) showed comparable accuracy to Doctor A (7.615) and Doctor C (7.673) (p < 0.0001) while achieving the highest completeness score (4.096), outperforming Doctor C (3.846), Doctor A (2.923), and Doctor B (2.808) (p < 0.0001). AI, including GPT-4o, is currently not an acceptable standalone method for diagnosing glaucoma due to its lower accuracy compared to human clinicians. These findings suggest that GPT-4o could serve as a valuable adjunct in clinical practice, particularly in complex cases, but should not replace human expertise, especially for initial diagnoses. Future improvements in AI models could enhance their utility in ophthalmology.