BIO Web of Conferences (Jan 2023)

Genome Analysis of 10K SARS-COV-2 Sequences to Identify the Presence of Single-Nucleotide Polymorphisms

  • Nugrahapraja Husna,
  • Hasna Syahira Nandrea,
  • Fauzi Alidza

DOI
https://doi.org/10.1051/bioconf/20237501005
Journal volume & issue
Vol. 75
p. 01005

Abstract

Read online

A new type of coronavirus was identified in Wuhan, China, in December 2019, which was named SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus-2). The high mutation rate of SARS-CoV2 makes it challenging to develop effective vaccines for all variants. Substitution is the most common type of mutation that occurs in SARS-CoV-2. This research was conducted to identify the genetic variability of mutations in SNP of SARS-CoV-2 and analyse the impact. About 15,000 sequences of SARS-CoV-2 were downloaded from GISAID, which were isolated from 33 different countries around the world from February 2020 to July 2021. Sequence analysis was done using the MAFFT and the Nextclade. The results of this study are expected to help identify conserved regions in SARS-CoV-2 which can be used as probes for the virus identification process and can be used as target areas in vaccine development. Furthermore the results showed that the most common variants were variants 20B, 20A, and 20I (Alpha), with a population percentage of 32.12%, 23.95% and 17.39% of the total population, respectively. Furthermore, SNPs were called in the samples using the SNP-sites and extracted using Excel. Of the 10,107 sequences of SARSCoV-2 studied, 154 SNPs were found with the highest number of SNPs in the spike, nsp3 and nucleocapsid genes. The ratio of the number of mutations to the most extensive sequence length was in the ORF8, ORF7a, and ORF7b genes with respective values of 0.537, 0.474, and 0.419.

Keywords