IEEE Access (Jan 2019)
Metabolic Syndrome and Development of Diabetes Mellitus: Predictive Modeling Based on Machine Learning Techniques
Abstract
The objective of this inductive research was to investigate: 1) the relationship between diabetes mellitus and individual risk factors of metabolic syndrome (MetS), in a non-conservative setting; 2) the prediction of future onset of diabetes using relevant risk factors of MetS; and 3) to investigate the relative performance of machine learning methods when data sampling techniques are used to generate balanced training sets. The dataset used in this research contains 667 907 records for a period ranging from 2003 to 2013. Quantifying the contribution of individual risk factors of MetS in the development of diabetes in a non-conservative setting logistic regression analysis was performed. Our analyses contradict the view that diabetes is commonly associated with low levels of high-density lipoprotein (HDL). Instead, our results demonstrate that the increased levels of HDL are positively correlated with diabetes onset, particularly in women. We also proposed J48 decision tree and Naïve Bayes methods for prediction of future onset of diabetes using relevant risk factors obtained from logistic regression analysis, over balanced and unbalanced datasets. The results demonstrated the supremacy of Naïve Bayes with K-medoids under-sampling technique as compared to random under-sampling, oversampling, and no sampling. It is achieved on average 79% receiver operating characteristic performance with the increased true positive rate. The results of this paper suggest further research to clarify the pathophysiological significance of HDL and pathways in the development of diabetes.
Keywords