iScience (Aug 2019)

Samovar: Single-Sample Mosaic Single-Nucleotide Variant Calling with Linked Reads

  • Charlotte A. Darby,
  • James R. Fitch,
  • Patrick J. Brennan,
  • Benjamin J. Kelly,
  • Natalie Bir,
  • Vincent Magrini,
  • Jeffrey Leonard,
  • Catherine E. Cottrell,
  • Julie M. Gastier-Foster,
  • Richard K. Wilson,
  • Elaine R. Mardis,
  • Peter White,
  • Ben Langmead,
  • Michael C. Schatz

Journal volume & issue
Vol. 18
pp. 1 – 10

Abstract

Read online

Summary: Linked-read sequencing enables greatly improves haplotype assembly over standard paired-end analysis. The detection of mosaic single-nucleotide variants benefits from haplotype assembly when the model is informed by the mapping between constituent reads and linked reads. Samovar evaluates haplotype-discordant reads identified through linked-read sequencing, thus enabling phasing and mosaic variant detection across the entire genome. Samovar trains a random forest model to score candidate sites using a dataset that considers read quality, phasing, and linked-read characteristics. Samovar calls mosaic single-nucleotide variants (SNVs) within a single sample with accuracy comparable with what previously required trios or matched tumor/normal pairs and outperforms single-sample mosaic variant callers at minor allele frequency 5%–50% with at least 30X coverage. Samovar finds somatic variants in both tumor and normal whole-genome sequencing from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at https://github.com/cdarby/samovar under the MIT license. : Biological Sciences; Genomics; Bioinformatics Subject Areas: Biological Sciences, Genomics, Bioinformatics