Samovar: Single-Sample Mosaic Single-Nucleotide Variant Calling with Linked Reads
Charlotte A. Darby,
James R. Fitch,
Patrick J. Brennan,
Benjamin J. Kelly,
Natalie Bir,
Vincent Magrini,
Jeffrey Leonard,
Catherine E. Cottrell,
Julie M. Gastier-Foster,
Richard K. Wilson,
Elaine R. Mardis,
Peter White,
Ben Langmead,
Michael C. Schatz
Affiliations
Charlotte A. Darby
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
James R. Fitch
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
Patrick J. Brennan
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
Benjamin J. Kelly
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
Natalie Bir
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
Vincent Magrini
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
Jeffrey Leonard
Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA; Department of Neurosurgery, Nationwide Children's Hospital, Columbus, OH, USA
Catherine E. Cottrell
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
Julie M. Gastier-Foster
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
Richard K. Wilson
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
Elaine R. Mardis
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
Peter White
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
Ben Langmead
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Corresponding author
Michael C. Schatz
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Biology, Johns Hopkins University, Baltimore, MD, USA; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Corresponding author
Summary: Linked-read sequencing enables greatly improves haplotype assembly over standard paired-end analysis. The detection of mosaic single-nucleotide variants benefits from haplotype assembly when the model is informed by the mapping between constituent reads and linked reads. Samovar evaluates haplotype-discordant reads identified through linked-read sequencing, thus enabling phasing and mosaic variant detection across the entire genome. Samovar trains a random forest model to score candidate sites using a dataset that considers read quality, phasing, and linked-read characteristics. Samovar calls mosaic single-nucleotide variants (SNVs) within a single sample with accuracy comparable with what previously required trios or matched tumor/normal pairs and outperforms single-sample mosaic variant callers at minor allele frequency 5%–50% with at least 30X coverage. Samovar finds somatic variants in both tumor and normal whole-genome sequencing from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at https://github.com/cdarby/samovar under the MIT license. : Biological Sciences; Genomics; Bioinformatics Subject Areas: Biological Sciences, Genomics, Bioinformatics