Ophthalmology Science (Mar 2024)

Machine Learning to Predict Faricimab Treatment Outcome in Neovascular Age-Related Macular Degeneration

  • Yusuke Kikuchi, PhD,
  • Michael G. Kawczynski, MS,
  • Neha Anegondi, MTech,
  • Ales Neubert, PhD,
  • Jian Dai, PhD,
  • Daniela Ferrara, MD, PhD,
  • Carlos Quezada-Ruiz, MD

Journal volume & issue
Vol. 4, no. 2
p. 100385

Abstract

Read online

Purpose: To develop machine learning (ML) models to predict, at baseline, treatment outcomes at month 9 in patients with neovascular age-related macular degeneration (nAMD) receiving faricimab. Design: Retrospective proof of concept study. Participants: Patients enrolled in the phase II AVENUE trial (NCT02484690) of faricimab in nAMD. Methods: Baseline characteristics and spectral domain-OCT (SD-OCT) image data from 185 faricimab-treated eyes were split into 80% training and 20% test sets at the patient level. Input variables were baseline age, sex, best-corrected visual acuity (BCVA), central subfield thickness (CST), low luminance deficit, treatment arm, and SD-OCT images. A regression problem (BCVA) and a binary classification problem (reduction of CST by 35%) were considered. Overall, 10 models were developed and tested for each problem. Benchmark classical ML models (linear, random forest, extreme gradient boosting) were trained on baseline characteristics; benchmark deep neural networks (DNNs) were trained on baseline SD-OCT B-scans. Baseline characteristics and SD-OCT data were merged using 2 approaches: model stacking (using DNN prediction as an input feature for classical ML models) and model averaging (which averaged predictions from the DNN using SD-OCT volume and from classical ML models using baseline characteristics). Main Outcome Measures: Treatment outcomes were defined by 2 target variables: functional (BCVA letter score) and anatomical (percent decrease in CST from baseline) outcomes at month 9. Results: The best-performing BCVA regression model with respect to the test coefficient of determination (R2) was the linear model in the model-stacking approach with R2 of 0.31. The best-performing CST classification model with respect to test area under receiver operating characteristics (AUROC) was the benchmark linear model with AUROC of 0.87. A post hoc analysis showed the baseline BCVA and the baseline CST had the most effect in the all-model prediction for BCVA regression and CST classification, respectively. Conclusions: Promising signals for predicting treatment outcomes from baseline characteristics were detected; however, the predictive benefit of baseline images was unclear in this proof-of-concept study. Further testing and validation with larger, independent datasets is required to fully explore the predictive capacity of ML models using baseline imaging data. Financial Disclosure(s): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords