International Journal of Computational Intelligence Systems (Nov 2024)
Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms
Abstract
Abstract Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthcare costs and minimizes the chance of serious complications. In this work, we propose an e-diagnostic model for diabetes classification via a machine learning algorithm that can be executed on the Internet of Medical Things (IoMT). The study uses and analyses two benchmarking datasets, the PIMA Indian Diabetes Dataset (PIDD) and the Behavioral Risk Factor Surveillance System (BRFSS) diabetes dataset, to classify diabetes. The proposed model consists of the random oversampling method to balance the range of classes, the interquartile range technique-based outlier detection to eliminate outlier data, and the Boruta algorithm for selecting the optimal features from the datasets. The proposed approach considers ML algorithms such as random forest, gradient boosting models, light gradient boosting classifiers, and decision trees, as they are widely used classification algorithms for diabetes prediction. We evaluated all four ML algorithms via performance indicators such as accuracy, F1 score, recall, precision, and AUC-ROC. Comparative analysis of this model suggests that the random forest algorithm outperforms all the remaining classifiers, with the greatest accuracy of 92% on the BRFSS diabetes dataset and 94% accuracy on the PIDD dataset, which is greater than the 3% accuracy reported in existing research. This research is helpful for assisting diabetologists in developing accurate treatment regimens for patients who are diabetic.
Keywords