Journal of King Saud University: Computer and Information Sciences (Jan 2024)

A novel evolutionary ensemble prediction model using harmony search and stacking for diabetes diagnosis

  • Zaiheng Zhang,
  • Yanjie Lu,
  • Mingtao Ye,
  • Wanyu Huang,
  • Lixu Jin,
  • Guodao Zhang,
  • Yisu Ge,
  • Alireza Baghban,
  • Qiwen Zhang,
  • Haiou Wang,
  • Wenzong Zhu

Journal volume & issue
Vol. 36, no. 1
p. 101873

Abstract

Read online

Diabetes is a dreaded disease that can be identified by elevated blood glucose levels in the blood, and undiagnosed diabetes can cause a host of related complications, such as retinopathy and nephropathy. In terms of type, the main categories are type 1 diabetes (T1DM), type 2 diabetes (T2DM) and gestational diabetes mellitus (GDM). Machine learning models and metaheuristic optimization algorithms can play an important role in the early detection, diagnosis and treatment of this disease. To this end, we propose AHDHS-Stacking, an ensemble learning framework for diabetes mellitus classification and diagnosis that is based on the harmony search (HS) algorithm and stacking and includes two stages of feature selection and optimization of base-learner combinations. To improve the model’s overall performance, the average performance of all base learners is used as the feature selection target, and an adaptive hyperparameter strategy is used to accelerate the iterative process. HS is then used to optimize to find the best combination of base learners, which improves model performance while reducing complexity. Following that, we conducted experiments on the Pima Indians Diabetes (PID) dataset and the Chinese and Western Medicine Diabetes (CWMD) dataset, achieving accuracy of 93.09%, precision of 93.22%, recall of 91.60% , F-measure of 92.25%, and MCC of 84.79% on PID dataset, which is better than all benchmark models and validated the model’s validity. CWMD dataset experimental results showed that AHDHS-Stacking screened for key features such as age, gender, urinary glucose, fasting glucose, BMI and cholesterol, and can be used as a practical and accurate method for early diabetes prediction.

Keywords