BMC Bioinformatics (Jan 2024)

The effect of data balancing approaches on the prediction of metabolic syndrome using non-invasive parameters based on random forest

  • Sahar Mohseni-Takalloo,
  • Hadis Mohseni,
  • Hassan Mozaffari-Khosravi,
  • Masoud Mirzaei,
  • Mahdieh Hosseinzadeh

DOI
https://doi.org/10.1186/s12859-024-05633-9
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Metabolic syndrome (MetS) is a cluster of metabolic abnormalities (including obesity, insulin resistance, hypertension, and dyslipidemia), which can be used to identify at-risk populations for diabetes and cardiovascular diseases, the main causes of morbidity and mortality worldwide. The achievement of a simple approach for diagnosing MetS without needing biochemical tests is so valuable. The present study aimed to predict MetS using non-invasive features based on a successful random forest learning algorithm. Also, to deal with the problem of data imbalance that naturally exists in this type of data, the effect of two different data balancing approaches, including the Synthetic Minority Over-sampling Technique (SMOTE) and Random Splitting data balancing (SplitBal), on model performance is investigated. Results The most important determinant for MetS prediction was waist circumference. Applying a random forest learning algorithm to imbalanced data, the trained models reach 86.9% and 79.4% accuracies and 37.1% and 38.2% sensitivities in men and women, respectively. However, by applying the SplitBal data balancing technique, the best results were obtained, and despite that the accuracy of the trained models decreased by 7.8% and 11.3%, but their sensitivity improved significantly to 82.3% and 73.7% in men and women, respectively. Conclusions The random forest learning method, along with data balancing techniques, especially SplitBal, could create MetS prediction models with promising results that can be applied as a useful prognostic tool in health screening programs.

Keywords