Advances in Virology (Jan 2023)
Whole-Genome Comparison of Representatives of All Variants of SARS-CoV-2, Including Subvariant BA.2 and the GKA Clade
Abstract
Since its discovery at the end of 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has rapidly evolved into many variants, including the subvariant BA.2 and the GKA clade. Genomic clarification is needed for better management of the current pandemic as well as the possible reemergence of novel variants. The sequence of the reference genome Wuhan-Hu-1 and approximately 20 representatives of each variant were downloaded from GenBank and GISAID. Two representatives with no track of in-definitive nucleotides were selected. The sequences were aligned using muscle. The location of insertion/deletion (indel) in the genome was mapped following the open reading frame (ORF) of Wuhan-Hu-1. The phylogeny of the spike protein coding region was constructed using the maximum likelihood method. Amino acid substitutions in all ORFs were analyzed separately. There are two indel sites in ORF1AB, eight in spike, and one each in ORF3A, matrix (MA), nucleoprotein (NP), and the 3′-untranslated regions (3′UTR). Some indel sites and residues/substitutions are not unique, and some are variant-specific. The phylogeny shows that Omicron, Deltacron, and BA2 are clustered together and separated from other variants with 100% bootstrap support. In conclusion, whole-genome comparison of representatives of all variants revealed indel patterns that are specific to SARS-CoV-2 variants or subvariants. Polymorphic amino acid comparison across all coding regions also showed amino acid residues shared by specific groups of variants. Finally, the higher transmissibility of BA.2 might be due at least in part to the 48 nucleotide deletions in the 3′UTR, while the seem-to-be extinction of GKA clade is due to the lack of genetic advantages as a consequence of amino acid substitutions in various genes.