IEEE Access (Jan 2024)

Hi-Le and HiTCLe: Ensemble Learning Approaches for Early Diabetes Detection Using Deep Learning and Explainable Artificial Intelligence

  • Ifra Shaheen,
  • Nadeem Javaid,
  • Nabil Alrajeh,
  • Yousra Asim,
  • Sheraz Aslam

DOI
https://doi.org/10.1109/ACCESS.2024.3398198
Journal volume & issue
Vol. 12
pp. 66516 – 66538

Abstract

Read online

Diabetes is a metabolic disease caused by the body’s failure to use insulin or break down meals correctly. Every year, an alarming number of new cases of diabetes are recorded. A poor lifestyle and an unfavorable environment are the two main causes of diabetes. If it is not treated at early stages, it becomes a lifelong disease and further leads to failure of important organs such as the kidneys, heart, eyes, and so on. This danger can be decreased with timely and precise identification. Deep Learning (DL) is the best method for illness prediction, as demonstrated by recent developments in DL for clinical use. We have proposed two ensemble learning approaches: blending and hybrid by using the Diabetes Prediction Dataset (DPD), which is a highly imbalanced dataset. The number of diabetic patients in it are 8500 whereas, the number of non-diabetic individuals are 91500. To overcome the class imbalance problem, a Proximity-Weighted Synthetic Oversampling (ProWSyn) technique is implemented. We have proposed a hybrid of highway and LeNet model, named Hi-Le, for early and accurate diabetes detection. Hi-Le model achieves an accuracy of 94%, a F1-Score of 96%, precision score of 94% and recall of 95% and beats its individual models in terms of accuracy, F1-Score, precision and recall. We have also proposed a blending model named HiTCLe using Highway, LeNet, and a Temporal Convolutional Network (TCN) to detect and predict diabetes at an early stage. HiTCLe performs best, beats its individual models, highway, TCN and LeNet, and achieves an accuracy score of 94% and a F1-Score of 94%, whereas individual models achieve an accuracy score between 89% and 91% on 10 epochs. To validate models’ results, we have implemented K-Fold Cross Validation (K-FCV). Also, to know the features contributions, we have implemented Shapley Additive eXplanations (SHAP) post processing technique. Both ensemble learning models outperform their individual models in term of accurate diabetes detection and prediction.

Keywords