BioMedical Engineering OnLine (May 2023)
Does multidimensional daily information predict the onset of myopia? A 1-year prospective cohort study
Abstract
Abstract Purpose This study aimed to develop an interpretable machine learning model to predict the onset of myopia based on individual daily information. Method This study was a prospective cohort study. At baseline, non-myopia children aged 6–13 years old were recruited, and individual data were collected through interviewing students and parents. One year after baseline, the incidence of myopia was evaluated based on visual acuity test and cycloplegic refraction measurement. Five algorithms, Random Forest, Support Vector Machines, Gradient Boosting Decision Tree, CatBoost and Logistic Regression were utilized to develop different models and their performance was validated by area under curve (AUC). Shapley Additive exPlanations was applied to interpret the model output on the individual and global level. Result Of 2221 children, 260 (11.7%) developed myopia in 1 year. In univariable analysis, 26 features were associated with the myopia incidence. Catboost algorithm had the highest AUC of 0.951 in the model validation. The top 3 features for predicting myopia were parental myopia, grade and frequency of eye fatigue. A compact model using only 10 features was validated with an AUC of 0.891. Conclusion The daily information contributed reliable predictors for childhood’s myopia onset. The interpretable Catboost model presented the best prediction performance. Oversampling technology greatly improved model performance. This model could be a tool in myopia preventing and intervention that can help identify children who are at risk of myopia, and provide personalized prevention strategies based on contributions of risk factors to the individual prediction result.
Keywords