IEEE Access (Jan 2022)

PE_DIM: An Efficient Probabilistic Ensemble Classification Algorithm for Diabetes Handling Class Imbalance Missing Values

  • Liyan Jia,
  • Zhiping Wang,
  • Siqi Lv,
  • Zhaohui Xu

DOI
https://doi.org/10.1109/ACCESS.2022.3212067
Journal volume & issue
Vol. 10
pp. 107459 – 107476

Abstract

Read online

Diabetes has become one of the seven major diseases affecting human death, so early prediction of the disease to prevent it is critical. Several existing works of literature, however, make predictions about diabetes with few considerations of missing and imbalanced data proper. To overcome these problems, in this paper, we propose an efficient Probabilistic Ensemble classification algorithm for Diabetes handling class Imbalance Missing values (PE_DIM) which can effectively handle the issue of missing imbalances and improve classification accuracy. First, a novel method based on Local Median-based Gaussian Naive Bayes (LMeGNB) is proposed to compensate for the missing values, combined with the K-means SMOTE method to adjust the positive and negative samples of diabetes to obtain the normalized balanced data. Then, a probability-based multi-stage ensemble is devoted to building ensemble models on the different types of machine learning algorithms. When extreme gradient boosting, random forests, and weighted $k$ nearest neighbors are integrated, the highest classification accuracy of 94.53% is obtained on Pima Indian diabetes dataset. Finally, to evaluate the PE_DIM model, the experiment equally considered two diabetes datasets, RSMH and Tabriz, to demonstrate the generality of the method in diabetes prediction. Additionally, in terms of area under the receiver operating characteristic curve metric uses several statistical tests to measure the performance of different classification methods. The ultimate results demonstrate that the average rank of this method is ranked first after 5-fold cross-validation, which is significantly different from the basic classifiers. Promisingly, the proposed method effectively solves the lack of diabetes imbalance and plays a significant role in intelligent medical treatment to improve diabetes research.

Keywords