Frontiers in Genetics (2021-03-01)

stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads

  • Junfu Guo,
  • Chang Shi,
  • Xi Chen,
  • Ou Wang,
  • Ping Liu,
  • Huanming Yang,
  • Xun Xu,
  • Wenwei Zhang,
  • Hongmei Zhu

DOI
https://doi.org/10.3389/fgene.2021.636239
Journal volume & issue
Vol. 12

Abstract

Read online

Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structural variants (SVs). Haplotype phasing by co-barcoded reads increases the signal to noise ratio, and barcode sharing profiles are used to filter out false positives. We integrate the short read SV caller smoove for smaller variants with stLFRsv. The integrated pipeline was evaluated on the well-characterized genome HG002/NA24385, and 74.5% precision and a 22.4% recall rate were obtained for deletions. stLFRsv revealed some large variants not included in the benchmark set that were verified by long reads or assembly. For the HG001/NA12878 genome, stLFRsv also achieved the best performance for both resource usage and the detection of large variants. Our work indicates that co-barcoded read technology has the potential to improve genome completeness.

Keywords