Microorganisms (Jan 2025)

γBMGC: A Comprehensive and Accurate Database for Screening TMAO-Associated Cardiovascular Diseases

  • Guang Yang,
  • Tiantian Tao,
  • Guohao Yu,
  • Hongqian Zhang,
  • Yiwen Wu,
  • Siqi Sun,
  • Kexin Guo,
  • Shulei Jia

DOI
https://doi.org/10.3390/microorganisms13020225
Journal volume & issue
Vol. 13, no. 2
p. 225

Abstract

Read online

Dietary l-carnitine produces γ-butylbetaine (γBB) in a gut-microbiota-dependent manner in humans, and has been proven to be an intermediate product possibly associated with incident cardiovascular diseases or major adverse events. Eliminating or reducing the production of microbiota-dependent γBB may contribute to adjuvant therapy for cardiovascular diseases. However, to date, our understanding of the γBB metabolic gene clusters (MGCs) and associated microorganisms remains limited. To solve this problem, we constructed a manually curated γBB metabolic gene cluster database (γBMGC) based on Hidden Markov Models (HMMs). It comprised 171,510 allelic genes from 85 species and 20 genera, which could effectively provide high-resolution analysis at the strain level. For simulated gene datasets, with a 50% identity cutoff, we achieved an annotation accuracy, PPV, specificity, F1-score, and NPV of 99.4%, 97.97%, 99.16%, 98.97%, and 100%, respectively, which significantly outperformed existing databases such as KEGG at similar thresholds. The γBMGC database is more accurate, comprehensive, and faster for profiling cardiovascular disease (CVD)-associated genes at the species or strain level, offering a higher resolution in identifying strain-specific γBB metabolic pathways compared to existing databases like KEGG or COG. Meanwhile, we validated the excellent performance of γBMGC in gene abundance analysis and bacterial species distinction. γBMGC is a powerful database for enhancing our understanding of the microbial l-carnitine pathway in the human gut, enabling rapid and high-accuracy analyses of the associated cardiovascular disease processes.

Keywords