Interdisciplinary Journal of Epidemiology and Public Health (Jun 2024)

Enhancing Obesity Detection Through SMOTE -based Classification Models: A comparative Study

  • John Kamwele Mutinda,
  • Amos Langat,
  • Regis Konan Marcel Djaha,
  • Jackson Ndoto Munyao,
  • Lee Whitaker,
  • Millicent Auma Omondi

DOI
https://doi.org/10.18041/2665-427X/ijeph.1.11532
Journal volume & issue
Vol. 7, no. 1

Abstract

Read online

Objective: To use SMOTE to enhance class balance and compare the performance of different classification methods before and after applying SMOTE. Methods: The study used a dataset from Kaggle. Consisted of several health-related features linked to obesity prediction. Checking for class imbalance within the dataset affected initial model performance. SMOTE was applied to synthetically increase the representation of minority classes, reducing the class imbalance. It was conducted in two stages: 1. Training and testing the classification algorithms before applying SMOTE. 2. Training and testing the same models after applying SMOTE to enhance class balance. The performance of all models was evaluated based on metrics before and after the SMOTE application. Results: Models Logistic Regression and Naive Bayes struggled with low sensitivity and specificity, and KNN (k=5) showed poor specificity. Significant improvements were observed across all models after applying SMOTE. Logistic Regression, despite a decrease in accuracy(-8.8), sensitivity and specificity increased substantially(+56.7%), with balanced accuracy improving(+16.6%). Naive Bayes saw a modest accuracy increase(+2.3%), with sensitivity and specificity improving(+47.9%). The KNN classifier exhibited a transformative enhancement with sensitivity and specificity increasing(+96.0%) and balanced accuracy(+28.3%). Deep Learning showed a significant increase in sensitivity (+69.8%), balanced accuracy (+29.4%), and an improvement in precision and F1-score despite a slight decrease in specificity(-10.9%). Conclusion: The results demonstrate that while there might be slight trade-offs, the overall improvements in key metrics such as sensitivity, specificity, balanced accuracy, precision, and F1-score affirm the utility of SMOTE in enhancing model performance for imbalanced datasets

Keywords