PLoS ONE (Jan 2015)

Bioinformatic Amplicon Read Processing Strategies Strongly Affect Eukaryotic Diversity and the Taxonomic Composition of Communities.

  • Markus Majaneva,
  • Kirsi Hyytiäinen,
  • Sirkka Liisa Varvio,
  • Satoshi Nagai,
  • Jaanika Blomster

DOI
https://doi.org/10.1371/journal.pone.0130035
Journal volume & issue
Vol. 10, no. 6
p. e0130035

Abstract

Read online

Amplicon read sequencing has revolutionized the field of microbial diversity studies. The technique has been developed for bacterial assemblages and has undergone rigorous testing with mock communities. However, due to the great complexity of eukaryotes and the numbers of different rDNA copies, analyzing eukaryotic diversity is more demanding than analyzing bacterial or mock communities, so studies are needed that test the methods of analyses on taxonomically diverse natural communities. In this study, we used 20 samples collected from the Baltic Sea ice, slush and under-ice water to investigate three program packages (UPARSE, mothur and QIIME) and 18 different bioinformatic strategies implemented in them. Our aim was to assess the impact of the initial steps of bioinformatic strategies on the results when analyzing natural eukaryotic communities. We found significant differences among the strategies in resulting read length, number of OTUs and estimates of diversity as well as clear differences in the taxonomic composition of communities. The differences arose mainly because of the variable number of chimeric reads that passed the pre-processing steps. Singleton removal and denoising substantially lowered the number of errors. Our study showed that the initial steps of the bioinformatic amplicon read processing strategies require careful consideration before applying them to eukaryotic communities.