Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows

Ryan Connor; Migun Shakya; David A. Yarmosh; Wolfgang Maier; Ross Martin; Rebecca Bradford; J. Rodney Brister; Patrick S. G. Chain; Courtney A. Copeland; Julia di Iulio; Bin Hu; Philip Ebert; Jonathan Gunti; Yumi Jin; Kenneth S. Katz; Andrey Kochergin; Tré LaRosa; Jiani Li; Po-E Li; Chien-Chi Lo; Sujatha Rashid; Evguenia S. Maiorova; Chunlin Xiao; Vadim Zalunin; Lisa Purcell; Kim D. Pruitt

doi:10.3390/v16030430

Viruses (Mar 2024)

Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows

Ryan Connor,
Migun Shakya,
David A. Yarmosh,
Wolfgang Maier,
Ross Martin,
Rebecca Bradford,
J. Rodney Brister,
Patrick S. G. Chain,
Courtney A. Copeland,
Julia di Iulio,
Bin Hu,
Philip Ebert,
Jonathan Gunti,
Yumi Jin,
Kenneth S. Katz,
Andrey Kochergin,
Tré LaRosa,
Jiani Li,
Po-E Li,
Chien-Chi Lo,
Sujatha Rashid,
Evguenia S. Maiorova,
Chunlin Xiao,
Vadim Zalunin,
Lisa Purcell,
Kim D. Pruitt

Affiliations

Ryan Connor: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Migun Shakya: Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
David A. Yarmosh: American Type Culture Collection, Manassas, VA 20110, USA
Wolfgang Maier: Galaxy Europe Team, University of Freiburg, 79085 Freiburg, Germany
Ross Martin: Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA
Rebecca Bradford: American Type Culture Collection, Manassas, VA 20110, USA
J. Rodney Brister: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Patrick S. G. Chain: Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Courtney A. Copeland: Deloitte Consulting LLP, Rosslyn, VA 22209, USA
Julia di Iulio: Vir Biotechnology Inc., San Francisco, CA 94158, USA
Bin Hu: Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Philip Ebert: Eli Lilly and Company, Indianapolis, IN 46225, USA
Jonathan Gunti: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Yumi Jin: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Kenneth S. Katz: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Andrey Kochergin: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Tré LaRosa: Deloitte Consulting LLP, Rosslyn, VA 22209, USA
Jiani Li: Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA
Po-E Li: Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Chien-Chi Lo: Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Sujatha Rashid: American Type Culture Collection, Manassas, VA 20110, USA
Evguenia S. Maiorova: Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA
Chunlin Xiao: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Vadim Zalunin: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Lisa Purcell: Vir Biotechnology Inc., San Francisco, CA 94158, USA
Kim D. Pruitt: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

DOI: https://doi.org/10.3390/v16030430
Journal volume & issue: Vol. 16, no. 3
p. 430

Abstract

Read online

Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.

Published in Viruses

ISSN: 1999-4915 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Microbiology
Website: http://www.mdpi.com/journal/viruses

About the journal

Abstract

Keywords