PLoS Computational Biology (Apr 2020)

Automated analysis of immunosequencing datasets reveals novel immunoglobulin D genes across diverse species.

  • Vinnu Bhardwaj,
  • Massimo Franceschetti,
  • Ramesh Rao,
  • Pavel A Pevzner,
  • Yana Safonova

DOI
https://doi.org/10.1371/journal.pcbi.1007837
Journal volume & issue
Vol. 16, no. 4
p. e1007837

Abstract

Read online

Immunoglobulin genes are formed through V(D)J recombination, which joins the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics focuses on finding alleles of germline genes across various patients. Although reconstruction of V and J genes is a well-studied problem, the more challenging task of reconstructing D genes remained open until the IgScout algorithm was developed in 2019. In this work, we address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction, apply it to hundreds of immunosequencing datasets from multiple species, and validate the newly inferred D genes by analyzing diverse whole genome sequencing datasets and haplotyping heterozygous V genes.