PLoS ONE (Jan 2022)

Worldwide SARS-CoV-2 haplotype distribution in early pandemic.

  • Andrea Cairo,
  • Marilena V Iorio,
  • Silvia Spena,
  • Elda Tagliabue,
  • Flora Peyvandi

DOI
https://doi.org/10.1371/journal.pone.0263705
Journal volume & issue
Vol. 17, no. 2
p. e0263705

Abstract

Read online

The world is experiencing one of the most severe viral outbreaks in the last few years, the pandemic infection by SARS-CoV-2, the causative agent of COVID-19 disease. As of December 10th 2021, the virus has spread worldwide, with a total number of more than 267 million of confirmed cases (four times more in the last year), and more than 5 million deaths. A great effort has been undertaken to molecularly characterize the virus, track the spreading of different variants across the globe with the aim to understand the potential effects in terms of transmission capability and different fatality rates. Here we focus on the genomic diversity and distribution of the virus in the early stages of the pandemic, to better characterize the origin of COVID-19 and to define the geographical and temporal evolution of genetic clades. By performing a comparative analysis of 75401 SARS-CoV-2 reported sequences (as of December 2020), using as reference the first viral sequence reported in Wuhan in December 2019, we described the existence of 26538 genetic variants, the most frequent clustering into four major clades characterized by a specific geographical distribution. Notably, we found the most frequent variant, the previously reported missense p.Asp614Gly in the S protein, as a single mutation in only three patients, whereas in the large majority of cases it occurs in concomitance with three other variants, suggesting a high linkage and that this variant alone might not provide a significant selective advantage to the virus. Moreover, we evaluated the presence and the distribution in our dataset of the mutations characterizing the so called "british variant", identified at the beginning of 2021, and observed that 9 out of 17 are present only in few sequences, but never in linkage with each other, suggesting a synergistic effect in this new viral strain. In summary, this is a large-scale analysis of SARS-CoV-2 deposited sequences, with a particular focus on the geographical and temporal evolution of genetic clades in the early phase of COVID-19 pandemic.