Enhancing Obesity Detection Through SMOTE -based Classification  Models: A comparative Study

John Kamwele Mutinda; Amos Langat; Regis Konan  Marcel Djaha; Jackson Ndoto Munyao; Lee  Whitaker; Millicent  Auma Omondi

doi:10.18041/2665-427x/ijeph.1.11532

Interdisciplinary Journal of Epidemiology and Public Health (Jun 2024)

Enhancing Obesity Detection Through SMOTE -based Classification Models: A comparative Study

John Kamwele Mutinda,
Amos Langat,
Regis Konan Marcel Djaha,
Jackson Ndoto Munyao,
Lee Whitaker,
Millicent Auma Omondi

Affiliations

John Kamwele Mutinda: ORCiD; University of Science and Technology of China, Langfang, Hebei, China, People’s Republic of China
Amos Langat: ORCiD; Department of Mathematics, Technology and Innovation-JKUAT, Pan African University Institute for Basic Sciences, Nairobi, Kenya
Regis Konan Marcel Djaha: ORCiD; Basque Center for Applied Mathematics, Bilbao, Basque, Spain
Jackson Ndoto Munyao: ORCiD; African Institute for Mathematical Sciences, Limbe, Cameroon
Lee Whitaker: ORCiD; African Institute for Mathematical Sciences, Limbe, Cameroon
Millicent Auma Omondi: ORCiD; South Eastern Kenya University, Kitui County, Kenya

DOI: https://doi.org/10.18041/2665-427x/ijeph.1.11532
Journal volume & issue: Vol. 7, no. 1

Abstract

Read online

Objective: To use SMOTE to enhance class balance and compare the performance of different classification methods before and after applying SMOTE. Methods: The study used a dataset from Kaggle. Consisted of several health-related features linked to obesity prediction. Checking for class imbalance within the dataset affected initial model performance. SMOTE was applied to synthetically increase the representation of minority classes, reducing the class imbalance. It was conducted in two stages: 1. Training and testing the classification algorithms before applying SMOTE. 2. Training and testing the same models after applying SMOTE to enhance class balance. The performance of all models was evaluated based on metrics before and after the SMOTE application. Results: Models Logistic Regression and Naive Bayes struggled with low sensitivity and specificity, and KNN (k=5) showed poor specificity. Significant improvements were observed across all models after applying SMOTE. Logistic Regression, despite a decrease in accuracy(-8.8), sensitivity and specificity increased substantially(+56.7%), with balanced accuracy improving(+16.6%). Naive Bayes saw a modest accuracy increase(+2.3%), with sensitivity and specificity improving(+47.9%). The KNN classifier exhibited a transformative enhancement with sensitivity and specificity increasing(+96.0%) and balanced accuracy(+28.3%). Deep Learning showed a significant increase in sensitivity (+69.8%), balanced accuracy (+29.4%), and an improvement in precision and F1-score despite a slight decrease in specificity(-10.9%). Conclusion: The results demonstrate that while there might be slight trade-offs, the overall improvements in key metrics such as sensitivity, specificity, balanced accuracy, precision, and F1-score affirm the utility of SMOTE in enhancing model performance for imbalanced datasets

Published in Interdisciplinary Journal of Epidemiology and Public Health

ISSN: 2665-427X (Online)
Publisher: Universidad Libre
Country of publisher: Colombia
LCC subjects: Medicine: Public aspects of medicine
Website: https://revistas.unilibre.edu.co/index.php/iJEPH/issue/view/656

About the journal

Abstract

Keywords