BMC Bioinformatics (Jul 2020)

PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST

  • Cinque Soto,
  • Jessica A. Finn,
  • Jordan R. Willis,
  • Samuel B. Day,
  • Robert S. Sinkovits,
  • Taylor Jones,
  • Samuel Schmitz,
  • Jens Meiler,
  • Andre Branchizio,
  • James E. Crowe

DOI
https://doi.org/10.1186/s12859-020-03649-5
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 7

Abstract

Read online

Abstract Background Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability. Results We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering. Conclusions PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.

Keywords