F1000Research (Nov 2016)

Disambiguate: An open-source application for disambiguating two species in next generation sequencing data from grafted samples [version 1; referees: 2 approved]

  • Miika J. Ahdesmäki,
  • Simon R. Gray,
  • Justin H. Johnson,
  • Zhongwu Lai

DOI
https://doi.org/10.12688/f1000research.10082.1
Journal volume & issue
Vol. 5

Abstract

Read online

Grafting of cell lines and primary tumours is a crucial step in the drug development process between cell line studies and clinical trials. Disambiguate is a program for computationally separating the sequencing reads of two species derived from grafted samples. Disambiguate operates on alignments to the two species and separates the components at very high sensitivity and specificity as illustrated in artificially mixed human-mouse samples. This allows for maximum recovery of data from target tumours for more accurate variant calling and gene expression quantification. Given that no general use open source algorithm accessible to the bioinformatics community exists for the purposes of separating the two species data, the proposed Disambiguate tool presents a novel approach and improvement to performing sequence analysis of grafted samples. Both Python and C++ implementations are available and they are integrated into several open and closed source pipelines. Disambiguate is open source and is freely available at https://github.com/AstraZeneca-NGS/disambiguate.

Keywords