Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters

Hauwau Abdulrahman Aliyu; Ibrahim Olawale Muritala; Habeeb Bello-Salau; Salisu Mohammed; Adeiza James Onumanyi; Ore-Ofe Ajayi

Franklin Open (Sep 2024)

Optimizing machine learning algorithms for diabetes data: A metaheuristic approach to balancing and tuning classifiers parameters

Hauwau Abdulrahman Aliyu,
Ibrahim Olawale Muritala,
Habeeb Bello-Salau,
Salisu Mohammed,
Adeiza James Onumanyi,
Ore-Ofe Ajayi

Affiliations

Hauwau Abdulrahman Aliyu: Department of Biochemistry and Molecular Biology, Federal University Birnin-Kebbi, 1157, Kebbi, Nigeria
Ibrahim Olawale Muritala: Department of Computer Engineering, Ahmadu Bello University, Zaria, 810107, Nigeria; Corresponding author.
Habeeb Bello-Salau: Department of Computer Engineering, Ahmadu Bello University, Zaria, 810107, Nigeria
Salisu Mohammed: Department of Maintenance Engineering, KRPC Ltd, Kaduna Nigerian National Petroleum Company, Kaduna, 800242, Nigeria
Adeiza James Onumanyi: AIoT, Next Generation Enterprises and Institutions, Council for Scientific and Industrial Research (CSIR), Pretoria, 0001, South Africa
Ore-Ofe Ajayi: Department of Computer Engineering, Ahmadu Bello University, Zaria, 810107, Nigeria

Journal volume & issue: Vol. 8
p. 100153

Abstract

Read online

Diabetes mellitus poses a global health concern, prompting the development of machine learning algorithms designed to construct a model for the accurate classification of patients, enabling precise diagnoses and early-stage treatment. However, the efficacy of classifying diabetes patients through machine learning relies on datasets, often plagued by imbalance, leading to biased classification and inaccurate diagnoses. Previous research attempts, employing techniques like random sampling (under-sampling and oversampling) and the Synthetic Minority Oversampling Technique (SMOTE), have struggled to achieve optimally balanced datasets. Additionally, setting the best parameters for machine learning classifiers remains a challenging task. To address these issues, this research focuses on devising a methodological metaheuristic optimization, a machine learning algorithm tailored for diabetes data balancing, and classifier hyperparameter tuning. Leveraging Particle Swarm Optimization (PSO) algorithm for diabetes data balancing and a genetic algorithm to select the optimal architecture for various machine learning classifiers. The study compares the performance of the proposed metaheuristic data balancer and classifier architecture parameter tuner using classification metrics (F1 score, Average Precision–Recall (APR), AUC, and accuracy). The PSO balanced dataset emerges as the most effective in classifying diabetes, with an Average Percentage Improvement (API) in classification performance metrics: 20.78% accuracy, 16.79% area under the curve for receiver operating characteristics, and a significant 32.78% enhancement in APR. Moreover, the XGBOOST classifier trained with a genetic algorithm demonstrates minimal computational training time for the Centre for Disease Control and Prevention (CDC) diabetes dataset compared to the artificial neural network and random forest classifier. Notably, the imbalanced CDC diabetes dataset exhibits the least APR compared to random under-sampling and the PSO data balancing technique.

Published in Franklin Open

ISSN: 2773-1871 (Print); 2773-1863 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Technology
Website: https://www.sciencedirect.com/journal/franklin-open

About the journal

Abstract

Keywords