Health Technology Assessment (Dec 2016)

An observational study to assess if automated diabetic retinopathy image assessment software can replace one or more steps of manual imaging grading and to determine their cost-effectiveness

  • Adnan Tufail,
  • Venediktos V Kapetanakis,
  • Sebastian Salas-Vega,
  • Catherine Egan,
  • Caroline Rudisill,
  • Christopher G Owen,
  • Aaron Lee,
  • Vern Louw,
  • John Anderson,
  • Gerald Liew,
  • Louis Bolter,
  • Clare Bailey,
  • SriniVas Sadda,
  • Paul Taylor,
  • Alicja R Rudnicka

DOI
https://doi.org/10.3310/hta20920
Journal volume & issue
Vol. 20, no. 92

Abstract

Read online

Background: Diabetic retinopathy screening in England involves labour-intensive manual grading of retinal images. Automated retinal image analysis systems (ARIASs) may offer an alternative to manual grading. Objectives: To determine the screening performance and cost-effectiveness of ARIASs to replace level 1 human graders or pre-screen with ARIASs in the NHS diabetic eye screening programme (DESP). To examine technical issues associated with implementation. Design: Observational retrospective measurement comparison study with a real-time evaluation of technical issues and a decision-analytic model to evaluate cost-effectiveness. Setting: A NHS DESP. Participants: Consecutive diabetic patients who attended a routine annual NHS DESP visit. Interventions: Retinal images were manually graded and processed by three ARIASs: iGradingM (version 1.1; originally Medalytix Group Ltd, Manchester, UK, but purchased by Digital Healthcare, Cambridge, UK, at the initiation of the study, purchased in turn by EMIS Health, Leeds, UK, after conclusion of the study), Retmarker (version 0.8.2, Retmarker Ltd, Coimbra, Portugal) and EyeArt (Eyenuk Inc., Woodland Hills, CA, USA). The final manual grade was used as the reference standard. Arbitration on a subset of discrepancies between manual grading and the use of an ARIAS by a reading centre masked to all grading was used to create a reference standard manual grade modified by arbitration. Main outcome measures: Screening performance (sensitivity, specificity, false-positive rate and likelihood ratios) and diagnostic accuracy [95% confidence intervals (CIs)] of ARIASs. A secondary analysis explored the influence of camera type and patients’ ethnicity, age and sex on screening performance. Economic analysis estimated the cost per appropriate screening outcome identified. Results: A total of 20,258 patients with 102,856 images were entered into the study. The sensitivity point estimates of the ARIASs were as follows: EyeArt 94.7% (95% CI 94.2% to 95.2%) for any retinopathy, 93.8% (95% CI 92.9% to 94.6%) for referable retinopathy and 99.6% (95% CI 97.0% to 99.9%) for proliferative retinopathy; and Retmarker 73.0% (95% CI 72.0% to 74.0%) for any retinopathy, 85.0% (95% CI 83.6% to 86.2%) for referable retinopathy and 97.9% (95% CI 94.9 to 99.1%) for proliferative retinopathy. iGradingM classified all images as either ‘disease’ or ‘ungradable’, limiting further iGradingM analysis. The sensitivity and false-positive rates for EyeArt were not affected by ethnicity, sex or camera type but sensitivity declined marginally with increasing patient age. The screening performance of Retmarker appeared to vary with patient’s age, ethnicity and camera type. Both EyeArt and Retmarker were cost saving relative to manual grading either as a replacement for level 1 human grading or used prior to level 1 human grading, although the latter was less cost-effective. A threshold analysis testing the highest ARIAS cost per patient before which ARIASs became more expensive per appropriate outcome than human grading, when used to replace level 1 grader, was Retmarker £3.82 and EyeArt £2.71 per patient. Limitations: The non-randomised study design limited the health economic analysis but the same retinal images were processed by all ARIASs in this measurement comparison study. Conclusions: Retmarker and EyeArt achieved acceptable sensitivity for referable retinopathy and false-positive rates (compared with human graders as reference standard) and appear to be cost-effective alternatives to a purely manual grading approach. Future work is required to develop technical specifications to optimise deployment and address potential governance issues. Funding: The National Institute for Health Research (NIHR) Health Technology Assessment programme, a Fight for Sight Grant (Hirsch grant award) and the Department of Health’s NIHR Biomedical Research Centre for Ophthalmology at Moorfields Eye Hospital and the University College London Institute of Ophthalmology.

Keywords