Genome Biology (Jan 2022)
Assessing reproducibility of inherited variants detected with short-read whole genome sequencing
- Bohu Pan,
- Luyao Ren,
- Vitor Onuchic,
- Meijian Guan,
- Rebecca Kusko,
- Steve Bruinsma,
- Len Trigg,
- Andreas Scherer,
- Baitang Ning,
- Chaoyang Zhang,
- Christine Glidewell-Kenney,
- Chunlin Xiao,
- Eric Donaldson,
- Fritz J. Sedlazeck,
- Gary Schroth,
- Gokhan Yavas,
- Haiying Grunenwald,
- Haodong Chen,
- Heather Meinholz,
- Joe Meehan,
- Jing Wang,
- Jingcheng Yang,
- Jonathan Foox,
- Jun Shang,
- Kelci Miclaus,
- Lianhua Dong,
- Leming Shi,
- Marghoob Mohiyuddin,
- Mehdi Pirooznia,
- Ping Gong,
- Rooz Golshani,
- Russ Wolfinger,
- Samir Lababidi,
- Sayed Mohammad Ebrahim Sahraeian,
- Steve Sherry,
- Tao Han,
- Tao Chen,
- Tieliu Shi,
- Wanwan Hou,
- Weigong Ge,
- Wen Zou,
- Wenjing Guo,
- Wenjun Bao,
- Wenzhong Xiao,
- Xiaohui Fan,
- Yoichi Gondo,
- Ying Yu,
- Yongmei Zhao,
- Zhenqiang Su,
- Zhichao Liu,
- Weida Tong,
- Wenming Xiao,
- Justin M. Zook,
- Yuanting Zheng,
- Huixiao Hong
Affiliations
- Bohu Pan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University
- Vitor Onuchic
- Illumina Inc.
- Meijian Guan
- SAS Institute Inc.
- Rebecca Kusko
- Immuneering Corporation
- Steve Bruinsma
- Illumina Inc.
- Len Trigg
- Real Time Genomics
- Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki
- Baitang Ning
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi
- Christine Glidewell-Kenney
- Illumina Inc.
- Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Eric Donaldson
- Center for Drug Evaluation and Research, Food and Drug Administration
- Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine
- Gary Schroth
- Illumina Inc.
- Gokhan Yavas
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Haiying Grunenwald
- Illumina Inc.
- Haodong Chen
- Sentieon Inc.
- Heather Meinholz
- Illumina Inc.
- Joe Meehan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Jing Wang
- Center for Advanced Measurement Science, National Institute of Metrology
- Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University
- Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine
- Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University
- Kelci Miclaus
- SAS Institute Inc.
- Lianhua Dong
- Center for Advanced Measurement Science, National Institute of Metrology
- Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University
- Marghoob Mohiyuddin
- Roche Sequencing Solutions
- Mehdi Pirooznia
- Bioinformatics and Computational Biology Laboratory, National Heart Lung and Blood Institute, National Institutes of Health
- Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center
- Rooz Golshani
- Illumina Inc.
- Russ Wolfinger
- SAS Institute Inc.
- Samir Lababidi
- Office of Health Informatics, Office of the Commissioner, US Food and Drug Administration
- Sayed Mohammad Ebrahim Sahraeian
- Roche Sequencing Solutions
- Steve Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Tao Han
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Tao Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Tieliu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University
- Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University
- Weigong Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Wen Zou
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Wenjing Guo
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Wenjun Bao
- SAS Institute Inc.
- Wenzhong Xiao
- Stanford Genome Technology Center, Stanford University School of Medicine
- Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University
- Yoichi Gondo
- Department of Molecular Life Sciences, Tokai University School of Medicine
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University
- Yongmei Zhao
- CCR-SF Bioinformatics Group, Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science, Frederick National Laboratory for Cancer Research
- Zhenqiang Su
- Takeda Pharmaceuticals
- Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- Wenming Xiao
- Division of Molecular Genetics and Pathology, Center for Device and Radiological Health, US Food and Drug Administration
- Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology
- Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University
- Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration
- DOI
- https://doi.org/10.1186/s13059-021-02569-8
- Journal volume & issue
-
Vol. 23,
no. 1
pp. 1 – 26
Abstract
Abstract Background Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.