EClinicalMedicine (Nov 2024)

Diagnostic performance of deep learning for infectious keratitis: a systematic review and meta-analysisResearch in context

  • Zun Zheng Ong,
  • Youssef Sadek,
  • Riaz Qureshi,
  • Su-Hsun Liu,
  • Tianjing Li,
  • Xiaoxuan Liu,
  • Yemisi Takwoingi,
  • Viknesh Sounderajah,
  • Hutan Ashrafian,
  • Daniel S.W. Ting,
  • Jodhbir S. Mehta,
  • Saaeha Rauz,
  • Dalia G. Said,
  • Harminder S. Dua,
  • Matthew J. Burton,
  • Darren S.J. Ting

Journal volume & issue
Vol. 77
p. 102887

Abstract

Read online

Summary: Background: Infectious keratitis (IK) is the leading cause of corneal blindness globally. Deep learning (DL) is an emerging tool for medical diagnosis, though its value in IK is unclear. We aimed to assess the diagnostic accuracy of DL for IK and its comparative accuracy with ophthalmologists. Methods: In this systematic review and meta-analysis, we searched EMBASE, MEDLINE, and clinical registries for studies related to DL for IK published between 1974 and July 16, 2024. We performed meta-analyses using bivariate models to estimate summary sensitivities and specificities. This systematic review was registered with PROSPERO (CRD42022348596). Findings: Of 963 studies identified, 35 studies (136,401 corneal images from >56,011 patients) were included. Most studies had low risk of bias (68.6%) and low applicability concern (91.4%) in all domains of QUADAS-2, except the index test domain. Against the reference standard of expert consensus and/or microbiological results (seven external validation studies; 10,675 images), the summary estimates (95% CI) for sensitivity and specificity of DL for IK were 86.2% (71.6–93.9) and 96.3% (91.5–98.5). From 28 internal validation studies (16,059 images), summary estimates for sensitivity and specificity were 91.6% (86.8–94.8) and 90.7% (84.8–94.5). Based on seven studies (4007 images), DL and ophthalmologists had comparable summary sensitivity [89.2% (82.2–93.6) versus 82.2% (71.5–89.5); P = 0.20] and specificity [(93.2% (85.5–97.0) versus 89.6% (78.8–95.2); P = 0.45]. Interpretation: DL models may have good diagnostic accuracy for IK and comparable performance to ophthalmologists. These findings should be interpreted with caution due to the image-based analysis that did not account for potential correlation within individuals, relatively homogeneous population studies, lack of pre-specification of DL thresholds, and limited external validation. Future studies should improve their reporting, data diversity, external validation, transparency, and explainability to increase the reliability and generalisability of DL models for clinical deployment. Funding: NIH, Wellcome Trust, MRC, Fight for Sight, BHP, and ESCRS.

Keywords