Environmental Microbiome (Aug 2024)

MICROPHERRET: MICRObial PHEnotypic tRait ClassifieR using Machine lEarning Techniques

  • Edoardo Bizzotto,
  • Sofia Fraulini,
  • Guido Zampieri,
  • Esteban Orellana,
  • Laura Treu,
  • Stefano Campanaro

DOI
https://doi.org/10.1186/s40793-024-00600-6
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 20

Abstract

Read online

Abstract Background In recent years, there has been a rapid increase in the number of microbial genomes reconstructed through shotgun sequencing, and obtained by newly developed approaches including metagenomic binning and single-cell sequencing. However, our ability to functionally characterize these genomes by experimental assays is orders of magnitude less efficient. Consequently, there is a pressing need for the development of swift and automated strategies for the functional classification of microbial genomes. Results The present work leverages a suite of supervised machine learning algorithms to establish a range of 86 metabolic and other ecological functions, such as methanotrophy and plastic degradation, starting from widely obtainable microbial genome annotations. Tests performed on independent datasets demonstrated robust performance across complete, fragmented, and incomplete genomes above a 70% completeness level for most of the considered functions. Application of the algorithms to the Biogas Microbiome database yielded predictions broadly consistent with current biological knowledge and correctly detecting functionally-related nuances of archaeal genomes. Finally, a case study focused on acetoclastic methanogenesis demonstrated how the developed machine learning models can be refined or expanded with models describing novel functions of interest. Conclusions The resulting tool, MICROPHERRET, incorporates a total of 86 models, one for each tested functional class, and can be applied to high-quality microbial genomes as well as to low-quality genomes derived from metagenomics and single-cell sequencing. MICROPHERRET can thus aid in understanding the functional role of newly generated genomes within their micro-ecological context.

Keywords