Bio-Protocol (Mar 2024)
Classification of a Massive Number of Viral Genomes and Estimation of Time of Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 Using Phylodynamic Analsysis
Abstract
Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.Key features• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.• Classification of genomes based on highly linked sites using custom scripts.• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.• Optimized for the SARS-CoV-2.Graphical overviewGraphical workflow of time of most recent common ancestor (tMRCA) estimation process