Early Risk Prediction of Diabetes Based on GA-Stacking

Yaqi Tan; He Chen; Jianjun Zhang; Ruichun Tang; Peishun Liu

doi:10.3390/app12020632

Applied Sciences (Jan 2022)

Early Risk Prediction of Diabetes Based on GA-Stacking

Yaqi Tan,
He Chen,
Jianjun Zhang,
Ruichun Tang,
Peishun Liu

Affiliations

Yaqi Tan: College of Information Science and Technology, Ocean University of China, Qingdao 266100, China
He Chen: College of Information Science and Technology, Ocean University of China, Qingdao 266100, China
Jianjun Zhang: Qingdao Center for Disease Control and Prevention, Qingdao 266033, China
Ruichun Tang: College of Information Science and Technology, Ocean University of China, Qingdao 266100, China
Peishun Liu: College of Information Science and Technology, Ocean University of China, Qingdao 266100, China

DOI: https://doi.org/10.3390/app12020632
Journal volume & issue: Vol. 12, no. 2
p. 632

Abstract

Read online

Early risk prediction of diabetes could help doctors and patients to pay attention to the disease and intervene as soon as possible, which can effectively reduce the risk of complications. In this paper, a GA-stacking ensemble learning model is proposed to improve the accuracy of diabetes risk prediction. Firstly, genetic algorithms (GA) based on Decision Tree (DT) is used to select individuals with high adaptability, that is, a subset of attributes suitable for diabetes risk prediction. Secondly, the optimized convolutional neural network (CNN) and support vector machine (SVM) are used as the primary learners of stacking to learn attribute subsets, respectively. Then, the output of CNN and SVM is used as the input of the mate learner, the fully connected layer, for classification. Qingdao desensitization physical examination data from 1 January 2017 to 31 December 2019 is used, which includes body temperature, BMI, waist circumference, and other indicators that may be related to early diabetes. We compared the performance of GA-stacking with K-nearest neighbor (KNN), SVM, logistic regression (LR), Naive Bayes (NB), and CNN before and after adding GA through the average prediction time, accuracy, precision, sensitivity, specificity, and F1-score. Results show that prediction efficiency can be improved by adding GA. GA-stacking has higher prediction accuracy. Moreover, the strong generalization ability and high prediction efficiency of GA-stacking have also been verified on the early-stage diabetes risk prediction dataset published by UCI.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords