BMC Medical Informatics and Decision Making (Nov 2024)
Explainable machine learning model for predicting the risk of significant liver fibrosis in patients with diabetic retinopathy
Abstract
Abstract Background Diabetic retinopathy (DR), a prevalent complication in patients with type 2 diabetes, has attracted increasing attention. Recent studies have explored a plausible association between retinopathy and significant liver fibrosis. The aim of this investigation was to develop a sophisticated machine learning (ML) model, leveraging comprehensive clinical datasets, to forecast the likelihood of significant liver fibrosis in patients with retinopathy and to interpret the ML model by applying the SHapley Additive exPlanations (SHAP) method. Methods This inquiry was based on data from the National Health and Nutrition Examination Survey 2005–2008 cohort. Utilizing the Fibrosis-4 index (FIB-4), liver fibrosis was stratified across a spectrum of grades (F0-F4). The severity of retinopathy was determined using retinal imaging and segmented into four discrete gradations. A ten-fold cross-validation approach was used to gauge the propensity towards liver fibrosis. Eight ML methodologies were used: Extreme Gradient Boosting, Random Forest, multilayer perceptron, Support Vector Machines, Logistic Regression (LR), Plain Bayes, Decision Tree, and k-nearest neighbors. The efficacy of these models was gauged using metrics, such as the area under the curve (AUC). The SHAP method was deployed to unravel the intricacies of feature importance and explicate the inner workings of the ML model. Results The analysis included 5,364 participants, of whom 2,116 (39.45%) exhibited notable liver fibrosis. Following random allocation, 3,754 individuals were assigned to the training set and 1,610 were allocated to the validation cohort. Nine variables were curated for integration into the ML model. Among the eight ML models scrutinized, the LR model attained zenith in both AUC (0.867, 95% CI: 0.855–0.878) and F1 score (0.749, 95% CI: 0.732–0.767). In internal validation, this model sustained its superiority, with an AUC of 0.850 and an F1 score of 0.736, surpassing all other ML models. The SHAP methodology unveils the foremost factors through importance ranking. Conclusion Sophisticated ML models were crafted using clinical data to discern the propensity for significant liver fibrosis in patients with retinopathy and to intervene early. Practice implications Improved early detection of liver fibrosis risk in retinopathy patients enhances clinical intervention outcomes.
Keywords