Viruses (Jul 2022)

Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike Protein

  • Shruti S. Patil,
  • Helen N. Catanese,
  • Kelly A. Brayton,
  • Eric T. Lofgren,
  • Assefaw H. Gebremedhin

DOI
https://doi.org/10.3390/v14081672
Journal volume & issue
Vol. 14, no. 8
p. 1672

Abstract

Read online

Severe acute respiratory syndrome-related coronavirus (SARS-CoV-2), which still infects hundreds of thousands of people globally each day despite various countermeasures, has been mutating rapidly. Mutations in the spike (S) protein seem to play a vital role in viral stability, transmission, and adaptability. Therefore, to control the spread of the virus, it is important to gain insight into the evolution and transmission of the S protein. This study deals with the temporal and geographical distribution of mutant S proteins from sequences gathered across the US over a period of 19 months in 2020 and 2021. The S protein sequences are studied using two approaches: (i) multiple sequence alignment is used to identify prominent mutations and highly mutable regions and (ii) sequence similarity networks are subsequently employed to gain further insight and study mutation profiles of concerning variants across the defined time periods and states. Additionally, we tracked the variants using visualizations on geographical maps. The visualizations produced using the Directed Weighted All Nearest Neighbors (DiWANN) networks and maps provided insights into the transmission of the virus that reflect well the statistics reported for the time periods studied. We found that the networks created using DiWANN are superior to commonly used approximate distance networks created using BLAST bitscores. The study offers a richer computational approach to analyze the transmission profile of the prominent S protein mutations in SARS-CoV-2 and can be extended to other proteins and viruses.

Keywords