Frontiers in Immunology (Dec 2012)

Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

  • Miri eMichaeli,
  • Hila eNoga,
  • Hilla eTabibian-Keissar,
  • Hilla eTabibian-Keissar,
  • Iris eBarshack,
  • Iris eBarshack,
  • Ramit eMehr

DOI
https://doi.org/10.3389/fimmu.2012.00386
Journal volume & issue
Vol. 3

Abstract

Read online

High throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

Keywords