Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads

Marius Welzel; Anja Lange; Dominik Heider; Michael Schwarz; Bernd Freisleben; Manfred Jensen; Jens Boenigk; Daniela Beisser

doi:10.1186/s12859-020-03852-4

BMC Bioinformatics (Nov 2020)

Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads

Marius Welzel,
Anja Lange,
Dominik Heider,
Michael Schwarz,
Bernd Freisleben,
Manfred Jensen,
Jens Boenigk,
Daniela Beisser

Affiliations

Marius Welzel: Department of Mathematics and Computer Science, University of Marburg
Anja Lange: Department of Bioinformatics and Computational Biophysics, University of Duisburg-Essen
Dominik Heider: Department of Mathematics and Computer Science, University of Marburg
Michael Schwarz: Department of Mathematics and Computer Science, University of Marburg
Bernd Freisleben: Department of Mathematics and Computer Science, University of Marburg
Manfred Jensen: Department of Biodiversity, University of Duisburg-Essen
Jens Boenigk: Department of Biodiversity, University of Duisburg-Essen
Daniela Beisser: Department of Biodiversity, University of Duisburg-Essen

DOI: https://doi.org/10.1186/s12859-020-03852-4
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system. Results We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub ( https://github.com/MW55/Natrix ) or as a Docker container on DockerHub ( https://hub.docker.com/r/mw55/natrix ). Conclusion Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords