Genome Biology (Jan 2022)

Assessing reproducibility of inherited variants detected with short-read whole genome sequencing

  • Bohu Pan,
  • Luyao Ren,
  • Vitor Onuchic,
  • Meijian Guan,
  • Rebecca Kusko,
  • Steve Bruinsma,
  • Len Trigg,
  • Andreas Scherer,
  • Baitang Ning,
  • Chaoyang Zhang,
  • Christine Glidewell-Kenney,
  • Chunlin Xiao,
  • Eric Donaldson,
  • Fritz J. Sedlazeck,
  • Gary Schroth,
  • Gokhan Yavas,
  • Haiying Grunenwald,
  • Haodong Chen,
  • Heather Meinholz,
  • Joe Meehan,
  • Jing Wang,
  • Jingcheng Yang,
  • Jonathan Foox,
  • Jun Shang,
  • Kelci Miclaus,
  • Lianhua Dong,
  • Leming Shi,
  • Marghoob Mohiyuddin,
  • Mehdi Pirooznia,
  • Ping Gong,
  • Rooz Golshani,
  • Russ Wolfinger,
  • Samir Lababidi,
  • Sayed Mohammad Ebrahim Sahraeian,
  • Steve Sherry,
  • Tao Han,
  • Tao Chen,
  • Tieliu Shi,
  • Wanwan Hou,
  • Weigong Ge,
  • Wen Zou,
  • Wenjing Guo,
  • Wenjun Bao,
  • Wenzhong Xiao,
  • Xiaohui Fan,
  • Yoichi Gondo,
  • Ying Yu,
  • Yongmei Zhao,
  • Zhenqiang Su,
  • Zhichao Liu,
  • Weida Tong,
  • Wenming Xiao,
  • Justin M. Zook,
  • Yuanting Zheng,
  • Huixiao Hong

DOI
https://doi.org/10.1186/s13059-021-02569-8
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 26

Abstract

Read online

Abstract Background Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.