Scientific Reports (Jan 2023)

Assessing the external validity of machine learning-based detection of glaucoma

  • Chi Li,
  • Jacqueline Chua,
  • Florian Schwarzhans,
  • Rahat Husain,
  • Michaël J. A. Girard,
  • Shivani Majithia,
  • Yih-Chung Tham,
  • Ching-Yu Cheng,
  • Tin Aung,
  • Georg Fischer,
  • Clemens Vass,
  • Inna Bujor,
  • Chee Keong Kwoh,
  • Alina Popa-Cherecheanu,
  • Leopold Schmetterer,
  • Damon Wong

DOI
https://doi.org/10.1038/s41598-023-27783-1
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Studies using machine learning (ML) approaches have reported high diagnostic accuracies for glaucoma detection. However, none assessed model performance across ethnicities. The aim of the study is to externally validate ML models for glaucoma detection from optical coherence tomography (OCT) data. We performed a prospective, cross-sectional study, where 514 Asians (257 glaucoma/257 controls) were enrolled to construct ML models for glaucoma detection, which was then tested on 356 Asians (183 glaucoma/173 controls) and 138 Caucasians (57 glaucoma/81 controls). We used the retinal nerve fibre layer (RNFL) thickness values produced by the compensation model, which is a multiple regression model fitted on healthy subjects that corrects the RNFL profile for anatomical factors and the original OCT data (measured) to build two classifiers, respectively. Both the ML models (area under the receiver operating [AUC] = 0.96 and accuracy = 92%) outperformed the measured data (AUC = 0.93; P < 0.001) for glaucoma detection in the Asian dataset. However, in the Caucasian dataset, the ML model trained with compensated data (AUC = 0.93 and accuracy = 84%) outperformed the ML model trained with original data (AUC = 0.83 and accuracy = 79%; P < 0.001) and measured data (AUC = 0.82; P < 0.001) for glaucoma detection. The performance with the ML model trained on measured data showed poor reproducibility across different datasets, whereas the performance of the compensated data was maintained. Care must be taken when ML models are applied to patient cohorts of different ethnicities.