Bracken: estimating species abundance in metagenomics data

Jennifer Lu; Florian P. Breitwieser; Peter Thielen; Steven L. Salzberg

doi:10.7717/peerj-cs.104

PeerJ Computer Science (Jan 2017)

Bracken: estimating species abundance in metagenomics data

Jennifer Lu,
Florian P. Breitwieser,
Peter Thielen,
Steven L. Salzberg

Affiliations

Jennifer Lu: Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
Florian P. Breitwieser: Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, United States
Peter Thielen: Applied Physics Laboratory, Johns Hopkins University, Laurel, MD, United States
Steven L. Salzberg: Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States

DOI: https://doi.org/10.7717/peerj-cs.104
Journal volume & issue: Vol. 3
p. e104

Abstract

Read online Read online

Metagenomic experiments attempt to characterize microbial communities using high-throughput DNA sequencing. Identification of the microorganisms in a sample provides information about the genetic profile, population structure, and role of microorganisms within an environment. Until recently, most metagenomics studies focused on high-level characterization at the level of phyla, or alternatively sequenced the 16S ribosomal RNA gene that is present in bacterial species. As the cost of sequencing has fallen, though, metagenomics experiments have increasingly used unbiased shotgun sequencing to capture all the organisms in a sample. This approach requires a method for estimating abundance directly from the raw read data. Here we describe a fast, accurate new method that computes the abundance at the species level using the reads collected in a metagenomics experiment. Bracken (Bayesian Reestimation of Abundance after Classification with KrakEN) uses the taxonomic assignments made by Kraken, a very fast read-level classifier, along with information about the genomes themselves to estimate abundance at the species level, the genus level, or above. We demonstrate that Bracken can produce accurate species- and genus-level abundance estimates even when a sample contains multiple near-identical species.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords