Feature importance and model performance for prediabetes prediction: A comparative study

Saeed Awad M Alqahtani; Hussah M Alobaid; Jamilah Alshammari; Safa A Alqarzae; Sheka Yagub Aloyouni; Ahood A. Al-Eidan; Salwa Alhamad; Abeer Almiman; Fadwa M Alkhulaifi; Suliman Alomar

Journal of King Saud University: Science (Dec 2024)

Feature importance and model performance for prediabetes prediction: A comparative study

Saeed Awad M Alqahtani,
Hussah M Alobaid,
Jamilah Alshammari,
Safa A Alqarzae,
Sheka Yagub Aloyouni,
Ahood A. Al-Eidan,
Salwa Alhamad,
Abeer Almiman,
Fadwa M Alkhulaifi,
Suliman Alomar

Affiliations

Saeed Awad M Alqahtani: Department of Basic Medical Sciences, College of Medicine, Taibah University, Medina, Saudi Arabia
Hussah M Alobaid: Department of Zoology, College of Science, King Saud University, Riyadh, Saudi Arabia
Jamilah Alshammari: Department of Zoology, College of Science, King Saud University, Riyadh, Saudi Arabia
Safa A Alqarzae: Department of Zoology, College of Science, King Saud University, Riyadh, Saudi Arabia
Sheka Yagub Aloyouni: Genetics section, Research Department, Natural and Health Sciences Research Center, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
Ahood A. Al-Eidan: Department of Biology, College of Science, Imam Abdulrahman Bin Faisal University, P.O.Box 1982, Dammam 34212, Saudi Arabia
Salwa Alhamad: Department of Biology, College of Science, Imam Abdulrahman Bin Faisal University, P.O.Box 1982, Dammam 34212, Saudi Arabia
Abeer Almiman: Department of Biology, College of Science, Imam Abdulrahman Bin Faisal University, P.O.Box 1982, Dammam 34212, Saudi Arabia
Fadwa M Alkhulaifi: Department of Biology, College of Science, Imam Abdulrahman Bin Faisal University, P.O.Box 1982, Dammam 34212, Saudi Arabia; Corresponding author.
Suliman Alomar: Department of Zoology, College of Science, King Saud University, Riyadh, Saudi Arabia

Journal volume & issue: Vol. 36, no. 11
p. 103583

Abstract

Read online

Objectives: Prediabetes is a significant health condition that elevates the risk of developing type 2 diabetes and other associated complications. This study aims to (1) explore the potential of machine learning models to improve the prediction of prediabetes, (2) compare the performance of various machine learning models with traditional regression methods, and (3) identify the most influential demographic, socioeconomic, and health-related factors associated with prediabetes. Methods: This study utilized data from the 2021 Behavioral Risk Factor Surveillance System (BRFSS) and employed comprehensive data preprocessing techniques. Logistic regression analysis was conducted to assess correlations between features and prediabetes risk. Feature importance was quantified using Adjusted Mutual Information values. Multiple machine learning models, including Random Forest, K Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), Neural Network, and Logistic Regression, were used for prediction. The best model was selected and validated through cross-validation to ensure robustness. Results: Significant associations were observed between prediabetes and key predictors such as cholesterol levels, BMI categories, hypertension status, age groups, and income categories. Among the models tested, Random Forest demonstrated the highest accuracy and robustness, outperforming traditional regression models. Conclusions: This study highlights the potential of machine learning to enhance prediabetes prediction and underscores the importance of identifying high-risk individuals for early intervention. The findings contribute to population health strategies by integrating advanced analytical methods with public health data.

Published in Journal of King Saud University: Science

ISSN: 1018-3647 (Print)
Publisher: Elsevier
Country of publisher: Saudi Arabia
LCC subjects: Science: Science (General)
Website: http://www.journals.elsevier.com/journal-of-king-saud-university-science/

About the journal

Abstract

Keywords