Scientific Reports (Jun 2024)
Exploratory risk prediction of type II diabetes with isolation forests and novel biomarkers
Abstract
Abstract Type II diabetes mellitus (T2DM) is a rising global health burden due to its rapidly increasing prevalence worldwide, and can result in serious complications. Therefore, it is of utmost importance to identify individuals at risk as early as possible to avoid long-term T2DM complications. In this study, we developed an interpretable machine learning model leveraging baseline levels of biomarkers of oxidative stress (OS), inflammation, and mitochondrial dysfunction (MD) for identifying individuals at risk of developing T2DM. In particular, Isolation Forest (iForest) was applied as an anomaly detection algorithm to address class imbalance. iForest was trained on the control group data to detect cases of high risk for T2DM development as outliers. Two iForest models were trained and evaluated through ten-fold cross-validation, the first on traditional biomarkers (BMI, blood glucose levels (BGL) and triglycerides) alone and the second including the additional aforementioned biomarkers. The second model outperformed the first across all evaluation metrics, particularly for F1 score and recall, which were increased from 0.61 ± 0.05 to 0.81 ± 0.05 and 0.57 ± 0.06 to 0.81 ± 0.08, respectively. The feature importance scores identified a novel combination of biomarkers, including interleukin-10 (IL-10), 8-isoprostane, humanin (HN), and oxidized glutathione (GSSG), which were revealed to be more influential than the traditional biomarkers in the outcome prediction. These results reveal a promising method for simultaneously predicting and understanding the risk of T2DM development and suggest possible pharmacological intervention to address inflammation and OS early in disease progression.
Keywords