PLoS Computational Biology (Jun 2023)

Latent Dirichlet Allocation modeling of environmental microbiomes

  • Anastasiia Kim,
  • Sanna Sevanto,
  • Eric R. Moore,
  • Nicholas Lubbers

Journal volume & issue
Vol. 19, no. 6

Abstract

Read online

Interactions between stressed organisms and their microbiome environments may provide new routes for understanding and controlling biological systems. However, microbiomes are a form of high-dimensional data, with thousands of taxa present in any given sample, which makes untangling the interaction between an organism and its microbial environment a challenge. Here we apply Latent Dirichlet Allocation (LDA), a technique for language modeling, which decomposes the microbial communities into a set of topics (non-mutually-exclusive sub-communities) that compactly represent the distribution of full communities. LDA provides a lens into the microbiome at broad and fine-grained taxonomic levels, which we show on two datasets. In the first dataset, from the literature, we show how LDA topics succinctly recapitulate many results from a previous study on diseased coral species. We then apply LDA to a new dataset of maize soil microbiomes under drought, and find a large number of significant associations between the microbiome topics and plant traits as well as associations between the microbiome and the experimental factors, e.g. watering level. This yields new information on the plant-microbial interactions in maize and shows that LDA technique is useful for studying the coupling between microbiomes and stressed organisms. Author summary Host-microbe interaction may be an important factor determining the performance and survival of an organism under stress. Understanding how microbiomes influence organisms under stress is a challenging new area of research because microbiomes are complex with the potential for complex responses and adaptations to stress that influence their interactions with the other stressed organisms. We show the use of LDA, a data-science technique in the context of environmental microbial datasets to break down the thousands of microbes present in samples into groups called topics; each topic is a group of organisms that occur together as a common pattern in the dataset. We show that this technique, combined with correlation analyses, provides a way to view a very large set of microbes in a sample as a smaller, more manageable set of communities of microbial taxa related to experimental conditions and plant traits. In this way, LDA helps to unravel complex interactions between organisms and their microbiome, which could help to better predict the behavior of real-world ecological systems, and perhaps support them through many challenges brought by changing climate and environment.