School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541214, China
Margit Heier
KORA Study Centre, University Hospital of Augsburg, 86153 Augsburg, Germany
Gabi Karstenmüller
Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
Karsten Suhre
Department of Physiology and Biophysics, Weill Cornell Medicine and Director of the Bioinformatics Core, Doha 24144, Qatar
Christian Gieger
Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
Wolfgang Koenig
Deutsches Herzzentrum München, Technische Universität München, 80636 München, Germany
Wolfgang Rathmann
Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research, Heinrich Heine University, 40225 Düsseldorf, Germany
Annette Peters
Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
Rui Wang-Sattler
Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
Accurate risk prediction for myocardial infarction (MI) is crucial for preventive strategies, given its significant impact on global mortality and morbidity. Here, we propose a novel deep-learning approach to enhance the prediction of incident MI cases by incorporating metabolomics alongside clinical risk factors. We utilized data from the KORA cohort, including the baseline S4 and follow-up F4 studies, consisting of 1454 participants without prior history of MI. The dataset comprised 19 clinical variables and 363 metabolites. Due to the imbalanced nature of the dataset (78 observed MI cases and 1376 non-MI individuals), we employed a generative adversarial network (GAN) model to generate new incident cases, augmenting the dataset and improving feature representation. To predict MI, we further utilized multi-layer perceptron (MLP) models in conjunction with the synthetic minority oversampling technique (SMOTE) and edited nearest neighbor (ENN) methods to address overfitting and underfitting issues, particularly when dealing with imbalanced datasets. To enhance prediction accuracy, we propose a novel GAN for feature-enhanced (GFE) loss function. The GFE loss function resulted in an approximate 2% improvement in prediction accuracy, yielding a final accuracy of 70%. Furthermore, we evaluated the contribution of each clinical variable and metabolite to the predictive model and identified the 10 most significant variables, including glucose tolerance, sex, and physical activity. This is the first study to construct a deep-learning approach for producing 7-year MI predictions using the newly proposed loss function. Our findings demonstrate the promising potential of our technique in identifying novel biomarkers for MI prediction.