Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

Nicole M. Davis; Diana M. Proctor; Susan P. Holmes; David A. Relman; Benjamin J. Callahan

doi:10.1186/s40168-018-0605-2

Microbiome (Dec 2018)

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

Nicole M. Davis,
Diana M. Proctor,
Susan P. Holmes,
David A. Relman,
Benjamin J. Callahan

Affiliations

Nicole M. Davis: Department of Microbiology and Immunology, Stanford University School of Medicine
Diana M. Proctor: Department of Medicine, Stanford University School of Medicine
Susan P. Holmes: Department of Statistics, Stanford University
David A. Relman: Department of Microbiology and Immunology, Stanford University School of Medicine
Benjamin J. Callahan: Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University

DOI: https://doi.org/10.1186/s40168-018-0605-2
Journal volume & issue: Vol. 6, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants—DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls. Results Decontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome and that some low-frequency taxa seemingly associated with preterm birth were contaminants. Conclusions Decontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. Decontam integrates easily with existing MGS workflows and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost.

Published in Microbiome

ISSN: 2049-2618 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Microbiology: Microbial ecology
Website: https://microbiomejournal.biomedcentral.com

About the journal

Abstract

Keywords