BMC Medical Informatics and Decision Making (Jul 2024)

Interpretable machine learning models for detecting peripheral neuropathy and lower extremity arterial disease in diabetics: an analysis of critical shared and unique risk factors

  • Ya Wu,
  • Danmeng Dong,
  • Lijie Zhu,
  • Zihong Luo,
  • Yang Liu,
  • Xiaoyun Xie

DOI
https://doi.org/10.1186/s12911-024-02595-z
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Diabetic peripheral neuropathy (DPN) and lower extremity arterial disease (LEAD) are significant contributors to diabetic foot ulcers (DFUs), which severely affect patients’ quality of life. This study aimed to develop machine learning (ML) predictive models for DPN and LEAD and to identify both shared and distinct risk factors. Methods This retrospective study included 479 diabetic inpatients, of whom 215 were diagnosed with DPN and 69 with LEAD. Clinical data and laboratory results were collected for each patient. Feature selection was performed using three methods: mutual information (MI), random forest recursive feature elimination (RF-RFE), and the Boruta algorithm to identify the most important features. Predictive models were developed using logistic regression (LR), random forest (RF), and eXtreme Gradient Boosting (XGBoost), with particle swarm optimization (PSO) used to optimize their hyperparameters. The SHapley Additive exPlanation (SHAP) method was applied to determine the importance of risk factors in the top-performing models. Results For diagnosing DPN, the XGBoost model was most effective, achieving a recall of 83.7%, specificity of 86.8%, accuracy of 85.4%, and an F1 score of 83.7%. On the other hand, the RF model excelled in diagnosing LEAD, with a recall of 85.7%, specificity of 92.9%, accuracy of 91.9%, and an F1 score of 82.8%. SHAP analysis revealed top five critical risk factors shared by DPN and LEAD, including increased urinary albumin-to-creatinine ratio (UACR), glycosylated hemoglobin (HbA1c), serum creatinine (Scr), older age, and carotid stenosis. Additionally, distinct risk factors were pinpointed: decreased serum albumin and lower lymphocyte count were linked to DPN, while elevated neutrophil-to-lymphocyte ratio (NLR) and higher D-dimer levels were associated with LEAD. Conclusions This study demonstrated the effectiveness of ML models in predicting DPN and LEAD in diabetic patients and identified significant risk factors. Focusing on shared risk factors may greatly reduce the prevalence of both conditions, thereby mitigating the risk of developing DFUs.

Keywords