BMC Neurology (Jan 2024)

Comparison of multiple linear regression and machine learning methods in predicting cognitive function in older Chinese type 2 diabetes patients

  • Chi-Hao Liu,
  • Chung-Hsin Peng,
  • Li-Ying Huang,
  • Fang-Yu Chen,
  • Chun-Heng Kuo,
  • Chung-Ze Wu,
  • Yu-Fang Cheng

DOI
https://doi.org/10.1186/s12883-023-03507-w
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Introduction The prevalence of type 2 diabetes (T2D) has increased dramatically in recent decades, and there are increasing indications that dementia is related to T2D. Previous attempts to analyze such relationships principally relied on traditional multiple linear regression (MLR). However, recently developed machine learning methods (Mach-L) outperform MLR in capturing non-linear relationships. The present study applied four different Mach-L methods to analyze the relationships between risk factors and cognitive function in older T2D patients, seeking to compare the accuracy between MLR and Mach-L in predicting cognitive function and to rank the importance of risks factors for impaired cognitive function in T2D. Methods We recruited older T2D between 60–95 years old without other major comorbidities. Demographic factors and biochemistry data were used as independent variables and cognitive function assessment (CFA) was conducted using the Montreal Cognitive Assessment as an independent variable. In addition to traditional MLR, we applied random forest (RF), stochastic gradient boosting (SGB), Naïve Byer’s classifier (NB) and eXtreme gradient boosting (XGBoost). Results Totally, the test cohort consisted of 197 T2D (98 men and 99 women). Results showed that all ML methods outperformed MLR, with symmetric mean absolute percentage errors for MLR, RF, SGB, NB and XGBoost respectively of 0.61, 0.599, 0.606, 0.599 and 0.2139. Education level, age, frailty score, fasting plasma glucose and body mass index were identified as key factors in descending order of importance. Conclusion In conclusion, our study demonstrated that RF, SGB, NB and XGBoost are more accurate than MLR for predicting CFA score, and identify education level, age, frailty score, fasting plasma glucose, body fat and body mass index as important risk factors in an older Chinese T2D cohort.

Keywords