mBio (Feb 2024)

Nine (not so simple) steps: a practical guide to using machine learning in microbial ecology

  • Corinne Walsh,
  • Elías Stallard-Olivera,
  • Noah Fierer

DOI
https://doi.org/10.1128/mbio.02050-23
Journal volume & issue
Vol. 15, no. 2

Abstract

Read online

ABSTRACTDue to the complex nature of microbiome data, the field of microbial ecology has many current and potential uses for machine learning (ML) modeling. With the increased use of predictive ML models across many disciplines, including microbial ecology, there is extensive published information on the specific ML algorithms available and how those algorithms have been applied. Thus, our goal is not to summarize the breadth of ML models available or compare their performances. Rather, our goal is to provide more concrete and actionable information to guide microbial ecologists in how to select, run, and interpret ML algorithms to predict the taxa or genes associated with particular sample categories or environmental gradients of interest. Such microbial data often have unique characteristics that require careful consideration of how to apply ML models and how to interpret the associated results. This review is intended for practicing microbial ecologists who may be unfamiliar with some of the intricacies of ML models. We provide examples and discuss common opportunities and pitfalls specific to applying ML models to the types of data sets most frequently collected by microbial ecologists.

Keywords