Microbiome (Jun 2021)

Constraining PERMANOVA and LDM to within-set comparisons by projection improves the efficiency of analyses of matched sets of microbiome data

  • Zhengyi Zhu,
  • Glen A. Satten,
  • Caroline Mitchell,
  • Yi-Juan Hu

DOI
https://doi.org/10.1186/s40168-021-01034-9
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background Matched-set data arise frequently in microbiome studies. For example, we may collect pre- and post-treatment samples from a set of individuals, or use important confounding variables to match data from case participants to one or more control participants. Thus, there is a need for statistical methods for data comprised of matched sets, to test hypotheses against traits of interest (e.g., clinical outcomes or environmental factors) at the community level and/or the operational taxonomic unit (OTU) level. Optimally, these methods should accommodate complex data such as those with unequal sample sizes across sets, confounders varying within sets, and continuous traits of interest. Methods PERMANOVA is a commonly used distance-based method for testing hypotheses at the community level. We have also developed the linear decomposition model (LDM) that unifies the community-level and OTU-level tests into one framework. Here we present a new strategy that can be used with both PERMANOVA and the LDM for analyzing matched-set data. We propose to include an indicator variable for each set as covariates, so as to constrain comparisons between samples within a set, and also permute traits within each set, which can account for exchangeable sample correlations. The flexible nature of PERMANOVA and the LDM allows discrete or continuous traits or interactions to be tested, within-set confounders to be adjusted, and unbalanced data to be fully exploited. Results Our simulations indicate that our proposed strategy outperformed alternative strategies, including the commonly used one that utilizes restricted permutation only, in a wide range of scenarios. Using simulation, we also explored optimal designs for matched-set studies. The flexibility of PERMANOVA and the LDM for a variety of matched-set microbiome data is illustrated by the analysis of data from two real studies. Conclusions Including set indicator variables and permuting within sets when analyzing matched-set data with PERMANOVA or the LDM is a strategy that performs well and is capable of handling the complex data structures that frequently occur in microbiome studies. Video Abstract

Keywords