Genome Biology (Mar 2018)

FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods

  • Timothy Becker,
  • Wan-Ping Lee,
  • Joseph Leone,
  • Qihui Zhu,
  • Chengsheng Zhang,
  • Silvia Liu,
  • Jack Sargent,
  • Kritika Shanker,
  • Adam Mil-homens,
  • Eliza Cerveira,
  • Mallory Ryan,
  • Jane Cha,
  • Fabio C. P. Navarro,
  • Timur Galeev,
  • Mark Gerstein,
  • Ryan E. Mills,
  • Dong-Guk Shin,
  • Charles Lee,
  • Ankit Malhotra

DOI
https://doi.org/10.1186/s13059-018-1404-6
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Comprehensive and accurate identification of structural variations (SVs) from next generation sequencing data remains a major challenge. We develop FusorSV, which uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. It includes a fusion model built using analysis of 27 deep-coverage human genomes from the 1000 Genomes Project. We identify 843 novel SV calls that were not reported by the 1000 Genomes Project for these 27 samples. Experimental validation of a subset of these calls yields a validation rate of 86.7%. FusorSV is available at https://github.com/TheJacksonLaboratory/SVE.

Keywords