Diagnostic decisions of specialist optometrists exposed to ambiguous deep-learning outputs

Josie Carmichael; Enrico Costanza; Ann Blandford; Robbert Struyven; Pearse A. Keane; Konstantinos Balaskas

doi:10.1038/s41598-024-55410-0

Scientific Reports (Mar 2024)

Diagnostic decisions of specialist optometrists exposed to ambiguous deep-learning outputs

Josie Carmichael,
Enrico Costanza,
Ann Blandford,
Robbert Struyven,
Pearse A. Keane,
Konstantinos Balaskas

Affiliations

Josie Carmichael: University College London Interaction Centre (UCLIC), UCL
Enrico Costanza: University College London Interaction Centre (UCLIC), UCL
Ann Blandford: University College London Interaction Centre (UCLIC), UCL
Robbert Struyven: Institute of Ophthalmology, NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust and UCL
Pearse A. Keane: Institute of Ophthalmology, NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust and UCL
Konstantinos Balaskas: Institute of Ophthalmology, NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust and UCL

DOI: https://doi.org/10.1038/s41598-024-55410-0
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Artificial intelligence (AI) has great potential in ophthalmology. We investigated how ambiguous outputs from an AI diagnostic support system (AI-DSS) affected diagnostic responses from optometrists when assessing cases of suspected retinal disease. Thirty optometrists (15 more experienced, 15 less) assessed 30 clinical cases. For ten, participants saw an optical coherence tomography (OCT) scan, basic clinical information and retinal photography (‘no AI’). For another ten, they were also given AI-generated OCT-based probabilistic diagnoses (‘AI diagnosis’); and for ten, both AI-diagnosis and AI-generated OCT segmentations (‘AI diagnosis + segmentation’) were provided. Cases were matched across the three types of presentation and were selected to include 40% ambiguous and 20% incorrect AI outputs. Optometrist diagnostic agreement with the predefined reference standard was lowest for ‘AI diagnosis + segmentation’ (204/300, 68%) compared to ‘AI diagnosis’ (224/300, 75% p = 0.010), and ‘no Al’ (242/300, 81%, p = < 0.001). Agreement with AI diagnosis consistent with the reference standard decreased (174/210 vs 199/210, p = 0.003), but participants trusted the AI more (p = 0.029) with segmentations. Practitioner experience did not affect diagnostic responses (p = 0.24). More experienced participants were more confident (p = 0.012) and trusted the AI less (p = 0.038). Our findings also highlight issues around reference standard definition.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords