Microbiology Spectrum (Feb 2024)

Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning

  • Olivia Riedling,
  • Allison S. Walker,
  • Antonis Rokas

DOI
https://doi.org/10.1128/spectrum.03400-23
Journal volume & issue
Vol. 12, no. 2

Abstract

Read online

ABSTRACTFungal secondary metabolites (SMs) contribute to the diversity of fungal ecological communities, niches, and lifestyles. Many fungal SMs have one or more medically and industrially important activities (e.g., antifungal, antibacterial, and antitumor). The genes necessary for fungal SM biosynthesis are typically located right next to each other in the genome and are known as biosynthetic gene clusters (BGCs). However, whether fungal SM bioactivity can be predicted from specific attributes of genes in BGCs remains an open question. We adapted machine learning models that predicted SM bioactivity from bacterial BGC data with accuracies as high as 80% to fungal BGC data. We trained our models to predict the antibacterial, antifungal, and cytotoxic/antitumor bioactivity of fungal SMs on two data sets: (i) fungal BGCs (data set comprised of 314 BGCs) and (ii) fungal (314 BGCs) and bacterial BGCs (1,003 BGCs). We found that models trained on fungal BGCs had balanced accuracies between 51% and 68%, whereas training on bacterial and fungal BGCs had balanced accuracies between 56% and 68%. The low prediction accuracy of fungal SM bioactivities likely stems from the small size of the data set; this lack of data, coupled with our finding that including bacterial BGC data in the training data did not substantially change accuracies currently limits the application of machine learning approaches to fungal SM studies. With >15,000 characterized fungal SMs, millions of putative BGCs in fungal genomes, and increased demand for novel drugs, efforts that systematically link fungal SM bioactivity to BGCs are urgently needed.IMPORTANCEFungi are key sources of natural products and iconic drugs, including penicillin and statins. DNA sequencing has revealed that there are likely millions of biosynthetic pathways in fungal genomes, but the chemical structures and bioactivities of >99% of natural products produced by these pathways remain unknown. We used artificial intelligence to predict the bioactivities of diverse fungal biosynthetic pathways. We found that the accuracies of our predictions were generally low, between 51% and 68%, likely because the natural products and bioactivities of only very few fungal pathways are known. With >15,000 characterized fungal natural products, millions of putative biosynthetic pathways present in fungal genomes, and increased demand for novel drugs, our study suggests that there is an urgent need for efforts that systematically identify fungal biosynthetic pathways, their natural products, and their bioactivities.

Keywords