Academy Journal of Science and Engineering (Oct 2024)

ENSEMBLE LEARNING AND FEATURE IMPORTANCE FOR PERSONALIZED DIABETES DIAGNOSIS

  • Saidu, I.R.,
  • Saleh, R.U.,
  • Muhammad Musa,
  • Abdulkadir, N.

Journal volume & issue
Vol. 18, no. 2
pp. 201 – 218

Abstract

Read online

Personalized accurate diagnosis is key for effective diabetes management. This study develops an ensemble machine learning approach using XGBoost, Random Forest, and Support Vector Machine for enhanced multi-class prediction of diabetes types. A dataset of 280 type 1, type 2, and non-diabetic patients from Nigeria is utilized. After data preprocessing, the base classifiers are hyperparameters-tuned using grid search. Then, soft voting was applied to create the ensemble classifier. The ensemble model achieved high predictive performance with accuracy of 90.48%, outperforming individual and prior classifiers. Detailed feature importance analysis identified age, HbA1c, weight, and fasting plasma glucose as top predictors for type 1 diabetes, while HbA1c, 2-hour plasma glucose, and fasting plasma glucose are most indicative of type 2 diabetes. The ensemble framework and tailored feature analysis enabled personalized diagnosis by gaining insights into distinguishing attributes between diabetes types. The approach demonstrates potential to improve clinical decision-making through robust, personalized predictions. Future work involves incorporating more risk factors and advanced feature selection techniques. The study has significant implications for advancing personalized medicine for diabetes.

Keywords