Strong versus Weak Data Labeling for Artificial Intelligence Algorithms in the Measurement of Geographic Atrophy

Amitha Domalpally, MD, PhD; Robert Slater, PhD; Rachel E. Linderman, PhD; Rohit Balaji; Jacob Bogost; Rick Voland, PhD; Jeong Pak, PhD; Barbara A. Blodi, MD; Roomasa Channa, MD; Donald Fong, MD; Emily Y. Chew, MD

Ophthalmology Science (Sep 2024)

Strong versus Weak Data Labeling for Artificial Intelligence Algorithms in the Measurement of Geographic Atrophy

Amitha Domalpally, MD, PhD,
Robert Slater, PhD,
Rachel E. Linderman, PhD,
Rohit Balaji,
Jacob Bogost,
Rick Voland, PhD,
Jeong Pak, PhD,
Barbara A. Blodi, MD,
Roomasa Channa, MD,
Donald Fong, MD,
Emily Y. Chew, MD

Affiliations

Amitha Domalpally, MD, PhD: A-EYE Research Unit, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin; Wisconsin Reading Center, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin; Correspondence: Amitha Domalpally, MD, PhD, 301 S Westfield Rd, Suite 200, Madison, WI 53717.
Robert Slater, PhD: A-EYE Research Unit, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin
Rachel E. Linderman, PhD: A-EYE Research Unit, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin; Wisconsin Reading Center, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin
Rohit Balaji: Wisconsin Reading Center, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin
Jacob Bogost: A-EYE Research Unit, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin
Rick Voland, PhD: Wisconsin Reading Center, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin
Jeong Pak, PhD: Wisconsin Reading Center, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin
Barbara A. Blodi, MD: A-EYE Research Unit, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin
Roomasa Channa, MD: Wisconsin Reading Center, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin
Donald Fong, MD: Annexon Biosciences, Brisbane, California
Emily Y. Chew, MD: Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, Maryland

Journal volume & issue: Vol. 4, no. 5
p. 100477

Abstract

Read online

Purpose: To gain an understanding of data labeling requirements to train deep learning models for measurement of geographic atrophy (GA) with fundus autofluorescence (FAF) images. Design: Evaluation of artificial intelligence (AI) algorithms. Subjects: The Age-Related Eye Disease Study 2 (AREDS2) images were used for training and cross-validation, and GA clinical trial images were used for testing. Methods: Training data consisted of 2 sets of FAF images; 1 with area measurements only and no indication of GA location (Weakly labeled) and the second with GA segmentation masks (Strongly labeled). Main Outcome Measures: Bland–Altman plots and scatter plots were used to compare GA area measurement between ground truth and AI measurements. The Dice coefficient was used to compare accuracy of segmentation of the Strong model. Results: In the cross-validation AREDS2 data set (n = 601), the mean (standard deviation [SD]) area of GA measured by human grader, Weakly labeled AI model, and Strongly labeled AI model was 6.65 (6.3) mm2, 6.83 (6.29) mm2, and 6.58 (6.24) mm2, respectively. The mean difference between ground truth and AI was 0.18 mm2 (95% confidence interval, [CI], −7.57 to 7.92) for the Weakly labeled model and −0.07 mm2 (95% CI, −1.61 to 1.47) for the Strongly labeled model. With GlaxoSmithKline testing data (n = 156), the mean (SD) GA area was 9.79 (5.6) mm2, 8.82 (4.61) mm2, and 9.55 (5.66) mm2 for human grader, Strongly labeled AI model, and Weakly labeled AI model, respectively. The mean difference between ground truth and AI for the 2 models was −0.97 mm2 (95% CI, −4.36 to 2.41) and −0.24 mm2 (95% CI, −4.98 to 4.49), respectively. The Dice coefficient was 0.99 for intergrader agreement, 0.89 for the cross-validation data, and 0.92 for the testing data. Conclusions: Deep learning models can achieve reasonable accuracy even with Weakly labeled data. Training methods that integrate large volumes of Weakly labeled images with small number of Strongly labeled images offer a promising solution to overcome the burden of cost and time for data labeling. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Published in Ophthalmology Science

ISSN: 2666-9145 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Ophthalmology
Website: https://www.journals.elsevier.com/ophthalmology-science/

About the journal

Abstract

Keywords