Nature Communications (Jun 2024)

Discovering type I cis-AT polyketides through computational mass spectrometry and genome mining with Seq2PKS

  • Donghui Yan,
  • Muqing Zhou,
  • Abhinav Adduri,
  • Yihao Zhuang,
  • Mustafa Guler,
  • Sitong Liu,
  • Hyonyoung Shin,
  • Torin Kovach,
  • Gloria Oh,
  • Xiao Liu,
  • Yuting Deng,
  • Xiaofeng Wang,
  • Liu Cao,
  • David H. Sherman,
  • Pamela J. Schultz,
  • Roland D. Kersten,
  • Jason A. Clement,
  • Ashootosh Tripathi,
  • Bahar Behsaz,
  • Hosein Mohimani

DOI
https://doi.org/10.1038/s41467-024-49587-1
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Type 1 polyketides are a major class of natural products used as antiviral, antibiotic, antifungal, antiparasitic, immunosuppressive, and antitumor drugs. Analysis of public microbial genomes leads to the discovery of over sixty thousand type 1 polyketide gene clusters. However, the molecular products of only about a hundred of these clusters are characterized, leaving most metabolites unknown. Characterizing polyketides relies on bioactivity-guided purification, which is expensive and time-consuming. To address this, we present Seq2PKS, a machine learning algorithm that predicts chemical structures derived from Type 1 polyketide synthases. Seq2PKS predicts numerous putative structures for each gene cluster to enhance accuracy. The correct structure is identified using a variable mass spectral database search. Benchmarks show that Seq2PKS outperforms existing methods. Applying Seq2PKS to Actinobacteria datasets, we discover biosynthetic gene clusters for monazomycin, oasomycin A, and 2-aminobenzamide-actiphenol.