BMC Medical Genomics (Sep 2024)
Prediction of metabolic syndrome using machine learning approaches based on genetic and nutritional factors: a 14-year prospective-based cohort study
Abstract
Abstract Introduction Metabolic syndrome is a chronic disease associated with multiple comorbidities. Over the last few years, machine learning techniques have been used to predict metabolic syndrome. However, studies incorporating demographic, clinical, laboratory, dietary, and genetic factors to predict the incidence of metabolic syndrome in Koreans are limited. In the present study, we propose a genome-wide polygenic risk score for the prediction of metabolic syndrome, along with other factors, to improve the prediction accuracy of metabolic syndrome. Methods We developed 7 machine learning-based models and used Cox multivariable regression, deep neural network (DNN), support vector machine (SVM), stochastic gradient descent (SGD), random forest (RAF), Naïve Bayes (NBA) classifier, and AdaBoost (ADB) to predict the incidence of metabolic syndrome at year 14 using the dataset from the Korean Genome and Epidemiology Study (KoGES) Ansan and Ansung. Results Of the 5440 patients, 2,120 were considered to have new-onset metabolic syndrome. The AUC values of model, which included sex, age, alcohol intake, energy intake, marital status, education status, income status, smoking status, dried laver intake, and genome-wide polygenic risk score (gPRS) Z-score based on 344,447 SNPs (p-value < 1.0), were the highest for RAF (0.994 [95% CI 0.985, 1.000]) and ADB (0.994 [95% CI 0.986, 1.000]). Conclusions Incorporating both gPRS and demographic, clinical, laboratory, and seaweed data led to enhanced metabolic syndrome risk prediction by capturing the distinct etiologies of metabolic syndrome development. The RAF- and ADB-based models predicted metabolic syndrome more accurately than the NBA-based model for the Korean population.
Keywords