Applied Sciences (Jan 2022)
Early Risk Prediction of Diabetes Based on GA-Stacking
Abstract
Early risk prediction of diabetes could help doctors and patients to pay attention to the disease and intervene as soon as possible, which can effectively reduce the risk of complications. In this paper, a GA-stacking ensemble learning model is proposed to improve the accuracy of diabetes risk prediction. Firstly, genetic algorithms (GA) based on Decision Tree (DT) is used to select individuals with high adaptability, that is, a subset of attributes suitable for diabetes risk prediction. Secondly, the optimized convolutional neural network (CNN) and support vector machine (SVM) are used as the primary learners of stacking to learn attribute subsets, respectively. Then, the output of CNN and SVM is used as the input of the mate learner, the fully connected layer, for classification. Qingdao desensitization physical examination data from 1 January 2017 to 31 December 2019 is used, which includes body temperature, BMI, waist circumference, and other indicators that may be related to early diabetes. We compared the performance of GA-stacking with K-nearest neighbor (KNN), SVM, logistic regression (LR), Naive Bayes (NB), and CNN before and after adding GA through the average prediction time, accuracy, precision, sensitivity, specificity, and F1-score. Results show that prediction efficiency can be improved by adding GA. GA-stacking has higher prediction accuracy. Moreover, the strong generalization ability and high prediction efficiency of GA-stacking have also been verified on the early-stage diabetes risk prediction dataset published by UCI.
Keywords