Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms

G. R. Ashisha; X. Anitha Mary; E. Grace Mary Kanaga; J. Andrew; R. Jennifer Eunice

doi:10.1007/s44196-024-00678-3

International Journal of Computational Intelligence Systems (Nov 2024)

Random Oversampling-Based Diabetes Classification via Machine Learning Algorithms

G. R. Ashisha,
X. Anitha Mary,
E. Grace Mary Kanaga,
J. Andrew,
R. Jennifer Eunice

Affiliations

G. R. Ashisha: Department of Electronics and Instrumentation Engineering, Karunya Institute of Technology and Sciences
X. Anitha Mary: Department of Robotics Engineering, Karunya Institute of Technology and Sciences
E. Grace Mary Kanaga: Department of Computer Science Engineering, Karunya Institute of Technology and Sciences
J. Andrew: Department of Computer Science Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education
R. Jennifer Eunice: Department of Mechatronics Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education

DOI: https://doi.org/10.1007/s44196-024-00678-3
Journal volume & issue: Vol. 17, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Diabetes mellitus is considered one of the main causes of death worldwide. If diabetes fails to be treated and diagnosed earlier, it can cause several other health problems, such as kidney disease, nerve disease, vision problems, and brain issues. Early detection of diabetes reduces healthcare costs and minimizes the chance of serious complications. In this work, we propose an e-diagnostic model for diabetes classification via a machine learning algorithm that can be executed on the Internet of Medical Things (IoMT). The study uses and analyses two benchmarking datasets, the PIMA Indian Diabetes Dataset (PIDD) and the Behavioral Risk Factor Surveillance System (BRFSS) diabetes dataset, to classify diabetes. The proposed model consists of the random oversampling method to balance the range of classes, the interquartile range technique-based outlier detection to eliminate outlier data, and the Boruta algorithm for selecting the optimal features from the datasets. The proposed approach considers ML algorithms such as random forest, gradient boosting models, light gradient boosting classifiers, and decision trees, as they are widely used classification algorithms for diabetes prediction. We evaluated all four ML algorithms via performance indicators such as accuracy, F1 score, recall, precision, and AUC-ROC. Comparative analysis of this model suggests that the random forest algorithm outperforms all the remaining classifiers, with the greatest accuracy of 92% on the BRFSS diabetes dataset and 94% accuracy on the PIDD dataset, which is greater than the 3% accuracy reported in existing research. This research is helpful for assisting diabetologists in developing accurate treatment regimens for patients who are diabetic.

Published in International Journal of Computational Intelligence Systems

ISSN: 1875-6891 (Print); 1875-6883 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44196

About the journal

Abstract

Keywords