Frontiers in Bioinformatics (Oct 2022)

Early detection of emerging SARS-CoV-2 variants of interest for experimental evaluation

  • Zachary S. Wallace,
  • Zachary S. Wallace,
  • James Davis,
  • James Davis,
  • Anna Maria Niewiadomska,
  • Robert D. Olson,
  • Robert D. Olson,
  • Maulik Shukla,
  • Maulik Shukla,
  • Rick Stevens,
  • Rick Stevens,
  • Yun Zhang,
  • Christian M. Zmasek,
  • Richard H. Scheuermann,
  • Richard H. Scheuermann,
  • Richard H. Scheuermann,
  • Richard H. Scheuermann

DOI
https://doi.org/10.3389/fbinf.2022.1020189
Journal volume & issue
Vol. 2

Abstract

Read online

Since the beginning of the COVID-19 pandemic, SARS-CoV-2 has demonstrated its ability to rapidly and continuously evolve, leading to the emergence of thousands of different sequence variants, many with distinctive phenotypic properties. Fortunately, the broad application of next generation sequencing (NGS) across the globe has produced a wealth of SARS-CoV-2 genome sequences, offering a comprehensive picture of how this virus is evolving so that accurate diagnostics, reliable therapeutics, and prophylactic vaccines against COVID-19 can be developed and maintained. The millions of SARS-CoV-2 sequences deposited into genomic sequencing databases, including GenBank, BV-BRC, and GISAID, are annotated with the dates and geographic locations of sample collection, and can be aligned to and compared with the Wuhan-Hu-1 reference genome to extract their constellation of nucleotide and amino acid substitutions. By aggregating these data into concise datasets, the spread of variants through space and time can be assessed. Variant tracking efforts have initially focused on the Spike protein due to its critical role in viral tropism and antibody neutralization. To identify emerging variants of concern as early as possible, we developed a computational pipeline to process the genomic data and assign risk scores based on both epidemiological and functional parameters. Epidemiological dynamics are used to identify variants exhibiting substantial growth over time and spread across geographical regions. Experimental data that quantify Spike protein regions targeted by adaptive immunity and critical for other virus characteristics are used to predict variants with consequential immunogenic and pathogenic impacts. The growth assessment and functional impact scores are combined to produce a Composite Score for any set of Spike substitutions detected. With this systematic method to routinely score and rank emerging variants, we have established an approach to identify threatening variants early and prioritize them for experimental evaluation.

Keywords