BMC Medical Genomics (Jan 2019)

Detecting virus integration sites based on multiple related sequencing data by VirTect

  • Yuchao Xia,
  • Yun Liu,
  • Minghua Deng,
  • Ruibin Xi

DOI
https://doi.org/10.1186/s12920-018-0461-8
Journal volume & issue
Vol. 12, no. S1
pp. 157 – 165

Abstract

Read online

Abstract Background Since tumor often has a high level of intra-tumor heterogeneity, multiple tumor samples from the same patient at different locations or different time points are often sequenced to study tumor intra-heterogeneity or tumor evolution. In virus-related tumors such as human papillomavirus- and Hepatitis B Virus-related tumors, virus genome integrations can be critical driving events. It is thus important to investigate the integration sites of the virus genomes. Currently, a few algorithms for detecting virus integration sites based on high-throughput sequencing have been developed, but their insufficient performance in their sensitivity, specificity and computational complexity hinders their applications in multiple related tumor sequencing. Results We develop VirTect for detecting virus integration sites simultaneously from multiple related-sample data. This algorithm is mainly based on the joint analysis of short reads spanning breakpoints of integration sites from multiple samples. To achieve high specificity and breakpoint accuracy, a local precise sandwich alignment algorithm is used. Simulation and real data analyses show that, compared with other algorithms, VirTect is significantly more sensitive and has a similar or lower false discovery rate. Conclusions VirTect can provide more accurate breakpoint position and is computationally much more efficient in terms both memory requirement and computational time.

Keywords