Scientific Reports (Nov 2024)
Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique
Abstract
Abstract This paper addresses the pressing issue of diabetes, which is a widespread condition affecting a huge population worldwide. As cells become less responsive to insulin or fail to produce it adequately, blood sugar levels rise. This has the potential to cause severe health complications including kidney disease, vision impairment and heart conditions. Early diagnosis is paramount in mitigating the risk and severity of diabetes-related complications. To tackle this, we proposed a robust framework for diabetes prediction using Synthetic Minority Over-sampling Technique (SMOTE) with ensemble machine learning techniques. Our approach incorporates strategies such as imputation of missing values, outlier rejection, feature selection using correlation analysis and class distribution balancing using SMOTE. The extensive experimentation shows that the proposed combination of AdaBoost and XGBoost shows exceptional performance, with an impressive AUC of 0.968+/-0.015. This outperforms not only alternative methodologies presented in our study but also surpasses current state-of-the-art results. We anticipate that our model will significantly improve diabetes prediction, offering a promising avenue for improved healthcare outcomes in diabetes management.
Keywords