mBio (Jun 2020)

Critical Relevance of Stochastic Effects on Low-Bacterial-Biomass 16S rRNA Gene Analysis

  • John R. Erb-Downward,
  • Nicole R. Falkowski,
  • Jennifer C. D’Souza,
  • Lisa M. McCloskey,
  • Roderick A. McDonald,
  • Christopher A. Brown,
  • Kerby Shedden,
  • Robert P. Dickson,
  • Christine M. Freeman,
  • Kathleen A. Stringer,
  • Betsy Foxman,
  • Gary B. Huffnagle,
  • Jeffrey L. Curtis,
  • Sara D. Adar

DOI
https://doi.org/10.1128/mBio.00258-20
Journal volume & issue
Vol. 11, no. 3

Abstract

Read online

ABSTRACT The bacterial microbiome of human body sites, previously considered sterile, remains highly controversial because it can be challenging to isolate signal from noise when low-biomass samples are being analyzed. We tested the hypothesis that stochastic sequencing noise, separable from reagent contamination, is generated during sequencing on the Illumina MiSeq platform when DNA input is below a critical threshold. We first purified DNA from serial dilutions of Pseudomonas aeruginosa and from negative controls using three DNA purification kits, quantified input using droplet digital PCR, and then sequenced the 16S rRNA gene in four technical replicates. This process identified reproducible contaminant signal that was separable from an irreproducible stochastic noise, which occurred as bacterial biomass of samples decreased. This approach was then applied to authentic respiratory samples from healthy individuals (n = 22) that ranged from high to ultralow bacterial biomass. Using oral rinse, bronchoalveolar lavage (BAL) fluid, and exhaled breath condensate (EBC) samples and matched controls, we were able to demonstrate (i) that stochastic noise dominates sequencing in real-world low-bacterial-biomass samples that contain fewer than 104 copies of the 16S rRNA gene per sample, (ii) that critical examination of the community composition of technical replicates can be used to separate signal from noise, and (iii) that EBC is an irreproducible sampling modality for sampling the microbiome of the lower airways. We anticipate that these results combined with suggested methods for identifying and dealing with noisy communities will facilitate increased reproducibility while simultaneously permitting characterization of potentially important low-biomass communities. IMPORTANCE DNA contamination from external sources (reagents, environment, operator, etc.) has long been assumed to be the main cause of spurious signals that appear under low-bacterial-biomass conditions. Here, we demonstrate that contamination can be separated from another, random signal generated during low-biomass-sample sequencing. This stochastic noise is not reproduced between technical replicates; however, results for any one replicate taken alone could look like a microbial community different from the controls. Using this information, we investigated respiratory samples from healthy humans and determined the narrow range of bacterial biomass where samples transition from producing reproducible microbial sequences to ones dominated by noise. We present a rigorous approach to studies involving low-bacterial-biomass samples to detect this source of noise and provide a framework for deciding if a sample is likely to be dominated by noise. We anticipate that this work will facilitate increased reproducibility in the characterization of potentially important low-biomass communities.

Keywords