VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data

Scott Christley; Mikhail K. Levin; Inimary T. Toby; John M. Fonner; Nancy L. Monson; William H. Rounds; Florian Rubelt; Walter Scarborough; Richard H. Scheuermann; Lindsay G. Cowell

doi:10.1186/s12859-017-1853-z

BMC Bioinformatics (Oct 2017)

VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data

Scott Christley,
Mikhail K. Levin,
Inimary T. Toby,
John M. Fonner,
Nancy L. Monson,
William H. Rounds,
Florian Rubelt,
Walter Scarborough,
Richard H. Scheuermann,
Lindsay G. Cowell

Affiliations

Scott Christley: Department of Clinical Sciences, UT Southwestern Medical Center
Mikhail K. Levin: Bank of America Corporate Center
Inimary T. Toby: Department of Clinical Sciences, UT Southwestern Medical Center
John M. Fonner: Texas Advanced Computing Center
Nancy L. Monson: Department of Neurology and Neurotherapeutics, UT Southwestern Medical Center
William H. Rounds: Department of Clinical Sciences, UT Southwestern Medical Center
Florian Rubelt: Department of Microbiology and Immunology, Stanford University School of Medicine
Walter Scarborough: Texas Advanced Computing Center
Richard H. Scheuermann: J. Craig Venter Institute
Lindsay G. Cowell: Department of Clinical Sciences, UT Southwestern Medical Center

DOI: https://doi.org/10.1186/s12859-017-1853-z
Journal volume & issue: Vol. 18, no. 1
pp. 1 – 5

Abstract

Read online

Abstract Background Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. Results Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5′ and 3′ PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. Conclusions VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords