Clinical Epidemiology and Global Health (Nov 2023)

Machine learning-driven early biomarker prediction for type 2 diabetes mellitus associated coronary artery diseases

  • Shraddha Jangili,
  • Hariprasad Vavilala,
  • Gopi Sumanth Bhaskar Boddeda,
  • Suryanaryana Murty Upadhyayula,
  • Ramu Adela,
  • Srinivasa Rao Mutheneni

Journal volume & issue
Vol. 24
p. 101433

Abstract

Read online

Background: Non-communicable diseases such as type 2 diabetes mellitus (T2DM) and coronary artery disease (CAD) are causing a significant burden on the human health care system. The present study aims to implement a machine learning (ML) based predictive model for T2DM, CAD and their co-occurrence, followed by identifying key variables for the occurrence of disease. Methods: The study is a data driven approach that applied various supervised ML models to predict the disease occurrence based on biochemical, demographic and physical data collected from 123 subjects. In addition, performance metrics like accuracy and AUC were used to evaluate the classification accuracy of ML models. Results: The data (n = 123 subjects) consist of male (83) and female (40) populace with ages from 35 to 70 years. Among all ML models, the Random Forest was outperformed with an accuracy of 76% (AUC:0.95). Similarly, in T2DM (AUC:0.92) and T2DM + CAD (AUC:0.94) classification, Random Forest achieved the highest accuracy followed by logistic regression for CAD data (AUC:0.98). The major risk factors recognized for T2DM, CAD and T2DM + CAD are HbA1c, FBS, CK-MB, APO.AII, APO.E, IP.10, and total cholesterol. Conclusion: The results suggest that the ML algorithms predict the prevalence of T2DM, CAD and concomitance in the populace through the integration of biochemical, physical and demographic factors. Based on the identified risk contributors for disease occurrence, preventive measures can be drafted to reduce the disease burden.

Keywords