Scientific Reports (Dec 2022)

Machine learning-derived gut microbiome signature predicts fatty liver disease in the presence of insulin resistance

  • Baeki E. Kang,
  • Aron Park,
  • Hyekyung Yang,
  • Yunju Jo,
  • Tae Gyu Oh,
  • Seung Min Jeong,
  • Yosep Ji,
  • Hyung‐Lae Kim,
  • Han‐Na Kim,
  • Johan Auwerx,
  • Seungyoon Nam,
  • Cheol-Young Park,
  • Dongryeol Ryu

DOI
https://doi.org/10.1038/s41598-022-26102-4
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 12

Abstract

Read online

Abstract A simple predictive biomarker for fatty liver disease is required for individuals with insulin resistance. Here, we developed a supervised machine learning-based classifier for fatty liver disease using fecal 16S rDNA sequencing data. Based on the Kangbuk Samsung Hospital cohort (n = 777), we generated a random forest classifier to predict fatty liver diseases in individuals with or without insulin resistance (n = 166 and n = 611, respectively). The model performance was evaluated based on metrics, including accuracy, area under receiver operating curve (AUROC), kappa, and F1-score. The developed classifier for fatty liver diseases performed better in individuals with insulin resistance (AUROC = 0.77). We further optimized the classifiers using genetic algorithm. The improved classifier for insulin resistance, consisting of ten microbial genera, presented an advanced classification (AUROC = 0.93), whereas the improved classifier for insulin-sensitive individuals failed to distinguish participants with fatty liver diseases from the healthy. The classifier for individuals with insulin resistance was comparable or superior to previous methods predicting fatty liver diseases (accuracy = 0.83, kappa = 0.50, F1-score = 0.89), such as the fatty liver index. We identified the ten genera as a core set from the human gut microbiome, which could be a diagnostic biomarker of fatty liver diseases for insulin resistant individuals. Collectively, these findings indicate that the machine learning classifier for fatty liver diseases in the presence of insulin resistance is comparable or superior to commonly used methods.