International Journal of Management and Data Analytics (Aug 2024)

Performance of Machine Learning Classifiers for Diabetes Prediction

  • Mijala Manandhar,
  • Shaikat Baidya,
  • Babalpreet Kaur,
  • Katia Atoji

Journal volume & issue
Vol. 4, no. 1
pp. 1 – 8

Abstract

Read online

In this study, machine learning (ML) classifiers were evaluated for their effectiveness in predicting diabetes using the Pima Indians Diabetes Database. The dataset included 768 instances with nine attributes, where the target variable indicated whether a patient tested positive for diabetes. The classifiers were grouped into Function (Logistic Regression, Multilayer Perceptron, Stochastic Gradient Descent), Rules (Decision Table, JRip, OneR), and Trees (Decision Stump, Hoeffding Tree, J48). Performance metrics such as accuracy, precision, recall, Matthews Correlation Coefficient, ROC Area, and F1-measure were used to compare the classifiers. Among the Function classifiers, Stochastic Gradient Descent (SGD) demonstrated the highest performance, particularly in handling large datasets and minimizing overfitting. Logistic Regression and Multilayer Perceptron also showed robust results, but SGD was superior in most metrics. For the Rules classifiers, JRip outperformed others due to its iterative rule optimization, whereas OneR's simplicity resulted in the lowest performance. Decision Table offered a clear representation of decision rules but was limited by the complexity of the dataset. In the Trees group, J48 was the most effective, benefitting from its ability to handle complex interactions and numerous features. The study highlights the potential of ML algorithms in early diabetes detection, enabling timely intervention and personalized management strategies. The importance of key predictors such as plasma glucose, BMI, and age was emphasized. Future research should focus on integrating multiple datasets and exploring more complex ML algorithms to enhance prediction accuracy and generalization. The development of real-time predictive systems is crucial for improving clinical processes and patient outcomes.

Keywords