Frontiers in Endocrinology (Jan 2024)

Development and validation of machine learning-augmented algorithm for insulin sensitivity assessment in the community and primary care settings: a population-based study in China

  • Hao Zhang,
  • Hao Zhang,
  • Tianshu Zeng,
  • Tianshu Zeng,
  • Jiaoyue Zhang,
  • Jiaoyue Zhang,
  • Juan Zheng,
  • Juan Zheng,
  • Jie Min,
  • Jie Min,
  • Miaomiao Peng,
  • Miaomiao Peng,
  • Geng Liu,
  • Geng Liu,
  • Xueyu Zhong,
  • Xueyu Zhong,
  • Ying Wang,
  • Ying Wang,
  • Kangli Qiu,
  • Kangli Qiu,
  • Shenghua Tian,
  • Shenghua Tian,
  • Xiaohuan Liu,
  • Xiaohuan Liu,
  • Hantao Huang,
  • Marina Surmach,
  • Ping Wang,
  • Xiang Hu,
  • Xiang Hu,
  • Lulu Chen,
  • Lulu Chen

DOI
https://doi.org/10.3389/fendo.2024.1292346
Journal volume & issue
Vol. 15

Abstract

Read online

ObjectiveInsulin plays a central role in the regulation of energy and glucose homeostasis, and insulin resistance (IR) is widely considered as the “common soil” of a cluster of cardiometabolic disorders. Assessment of insulin sensitivity is very important in preventing and treating IR-related disease. This study aims to develop and validate machine learning (ML)-augmented algorithms for insulin sensitivity assessment in the community and primary care settings.MethodsWe analyzed the data of 9358 participants over 40 years old who participated in the population-based cohort of the Hubei center of the REACTION study (Risk Evaluation of Cancers in Chinese Diabetic Individuals). Three non-ensemble algorithms and four ensemble algorithms were used to develop the models with 70 non-laboratory variables for the community and 87 (70 non-laboratory and 17 laboratory) variables for the primary care settings to screen the classifier of the state-of-the-art. The models with the best performance were further streamlined using top-ranked 5, 8, 10, 13, 15, and 20 features. Performances of these ML models were evaluated using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPR), and the Brier score. The Shapley additive explanation (SHAP) analysis was employed to evaluate the importance of features and interpret the models.ResultsThe LightGBM models developed for the community (AUROC 0.794, AUPR 0.575, Brier score 0.145) and primary care settings (AUROC 0.867, AUPR 0.705, Brier score 0.119) achieved higher performance than the models constructed by the other six algorithms. The streamlined LightGBM models for the community (AUROC 0.791, AUPR 0.563, Brier score 0.146) and primary care settings (AUROC 0.863, AUPR 0.692, Brier score 0.124) using the 20 top-ranked variables also showed excellent performance. SHAP analysis indicated that the top-ranked features included fasting plasma glucose (FPG), waist circumference (WC), body mass index (BMI), triglycerides (TG), gender, waist-to-height ratio (WHtR), the number of daughters born, resting pulse rate (RPR), etc.ConclusionThe ML models using the LightGBM algorithm are efficient to predict insulin sensitivity in the community and primary care settings accurately and might potentially become an efficient and practical tool for insulin sensitivity assessment in these settings.

Keywords