SynthEye: Investigating the Impact of Synthetic Data on Artificial Intelligence-assisted Gene Diagnosis of Inherited Retinal Disease

Yoga Advaith Veturi, MSc; William Woof, PhD; Teddy Lazebnik, PhD; Ismail Moghul, PhD; Peter Woodward-Court, PhD, MBBS; Siegfried K. Wagner, BMBCh; Thales Antonio Cabral de Guimarães, PhD, MD; Malena Daich Varela, PhD, MD; Bart Liefers, PhD; Praveen J. Patel, MBBChir MD(Res); Stephan Beck, PhD; Andrew R. Webster, FRCOphth; Omar Mahroo, PhD, MBBChir; Pearse A. Keane, MD, MB BCH BAO; Michel Michaelides, MD; Konstantinos Balaskas, MD; Nikolas Pontikos, PhD

Ophthalmology Science (Jun 2023)

SynthEye: Investigating the Impact of Synthetic Data on Artificial Intelligence-assisted Gene Diagnosis of Inherited Retinal Disease

Yoga Advaith Veturi, MSc,
William Woof, PhD,
Teddy Lazebnik, PhD,
Ismail Moghul, PhD,
Peter Woodward-Court, PhD, MBBS,
Siegfried K. Wagner, BMBCh,
Thales Antonio Cabral de Guimarães, PhD, MD,
Malena Daich Varela, PhD, MD,
Bart Liefers, PhD,
Praveen J. Patel, MBBChir MD(Res),
Stephan Beck, PhD,
Andrew R. Webster, FRCOphth,
Omar Mahroo, PhD, MBBChir,
Pearse A. Keane, MD, MB BCH BAO,
Michel Michaelides, MD,
Konstantinos Balaskas, MD,
Nikolas Pontikos, PhD

Affiliations

Yoga Advaith Veturi, MSc: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
William Woof, PhD: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Teddy Lazebnik, PhD: University College London Cancer Institute, University College London, London, UK
Ismail Moghul, PhD: Moorfields Eye Hospital, London, UK
Peter Woodward-Court, PhD, MBBS: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Siegfried K. Wagner, BMBCh: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Thales Antonio Cabral de Guimarães, PhD, MD: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Malena Daich Varela, PhD, MD: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Bart Liefers, PhD: Moorfields Eye Hospital, London, UK
Praveen J. Patel, MBBChir MD(Res): Moorfields Eye Hospital, London, UK
Stephan Beck, PhD: University College London Cancer Institute, University College London, London, UK
Andrew R. Webster, FRCOphth: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Omar Mahroo, PhD, MBBChir: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Pearse A. Keane, MD, MB BCH BAO: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Michel Michaelides, MD: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Konstantinos Balaskas, MD: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK
Nikolas Pontikos, PhD: University College London Institute of Ophthalmology, University College London, London, UK; Moorfields Eye Hospital, London, UK; Correspondence: Nikolas Pontikos, PhD, University College London Institute of Ophthalmology, 11-43 Bath Street, London EC1V 9EL, UK. E-mail: [email protected]

Journal volume & issue: Vol. 3, no. 2
p. 100258

Abstract

Read online

Purpose: Rare disease diagnosis is challenging in medical image-based artificial intelligence due to a natural class imbalance in datasets, leading to biased prediction models. Inherited retinal diseases (IRDs) are a research domain that particularly faces this issue. This study investigates the applicability of synthetic data in improving artificial intelligence-enabled diagnosis of IRDs using generative adversarial networks (GANs). Design: Diagnostic study of gene-labeled fundus autofluorescence (FAF) IRD images using deep learning. Participants: Moorfields Eye Hospital (MEH) dataset of 15 692 FAF images obtained from 1800 patients with confirmed genetic diagnosis of 1 of 36 IRD genes. Methods: A StyleGAN2 model is trained on the IRD dataset to generate 512 × 512 resolution images. Convolutional neural networks are trained for classification using different synthetically augmented datasets, including real IRD images plus 1800 and 3600 synthetic images, and a fully rebalanced dataset. We also perform an experiment with only synthetic data. All models are compared against a baseline convolutional neural network trained only on real data. Main Outcome Measures: We evaluated synthetic data quality using a Visual Turing Test conducted with 4 ophthalmologists from MEH. Synthetic and real images were compared using feature space visualization, similarity analysis to detect memorized images, and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) score for no-reference-based quality evaluation. Convolutional neural network diagnostic performance was determined on a held-out test set using the area under the receiver operating characteristic curve (AUROC) and Cohen’s Kappa (κ). Results: An average true recognition rate of 63% and fake recognition rate of 47% was obtained from the Visual Turing Test. Thus, a considerable proportion of the synthetic images were classified as real by clinical experts. Similarity analysis showed that the synthetic images were not copies of the real images, indicating that copied real images, meaning the GAN was able to generalize. However, BRISQUE score analysis indicated that synthetic images were of significantly lower quality overall than real images (P < 0.05). Comparing the rebalanced model (RB) with the baseline (R), no significant change in the average AUROC and κ was found (R-AUROC = 0.86[0.85-88], RB-AUROC = 0.88[0.86-0.89], R-κ = 0.51[0.49-0.53], and RB-κ = 0.52[0.50-0.54]). The synthetic data trained model (S) achieved similar performance as the baseline (S-AUROC = 0.86[0.85-87], S-κ = 0.48[0.46-0.50]). Conclusions: Synthetic generation of realistic IRD FAF images is feasible. Synthetic data augmentation does not deliver improvements in classification performance. However, synthetic data alone deliver a similar performance as real data, and hence may be useful as a proxy to real data.Financial Disclosure(s): Proprietary or commercial disclosure may be found after the references.

Published in Ophthalmology Science

ISSN: 2666-9145 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Ophthalmology
Website: https://www.journals.elsevier.com/ophthalmology-science/

About the journal

Abstract

Keywords