Инфекция и иммунитет (Feb 2021)
The use of statistical phylogenetics in virology
Abstract
Molecular phylogenetics, particularly statistical phylogenetics, is widely used to solve the fundamental and applied problems in virology. Bayesian, or statistical, phylogenetic methods, which came into practice 10—15 years ago, markedly expanded the range of questions that can be answered based on analyzing nucleotide and amino acid sequences. An opportunity of using various evolution models allows inferring the chronology, geography and dynamics of the infection spreading. For example, analysis of globally distributed HIV group M by Bayesian methods demonstrated with a probability of 99% that the most recent common ancestor of these viruses existed in the surroundings of the city of Kinshasa (Democratic Republic of the Congo) in the early 1920s. Another study showed that H9N2 influenza virus most likely passed on to humans from wild ducks in Hong Kong in the late 1960s. In addition, using of the Bayesian analysis allows to evaluating the effect of measures taken on the development of the epidemic process. For example, it was shown retrospectively that the rate of hepatitis C virus infection cases in Egypt increased by several orders of magnitude in the mid-20th century. A sharp rise in new case rate is associated with the treatment for schistosomiasis by using non-sterile repeatedly used syringes. A set of Bayesian analysis methods has been applied in tens of thousands of researches describing various aspects of the occurrence and spread of infectious diseases in humans and animals. This was facilitated by the development and accessibility of software that implements these methods. The complexity of Bayesian phylogenetic methods imposes strict requirements on the data being analyzed. The correctness of the phylogenetic analysis data depends on various factors. For example, it is necessary to choose an evolutionary model that most adequately describes the studied objects. A mandatory step in formulating the results is the justification of the selected model. For viruses, the acquisition of genetic elements from other organisms is typical, therefore, the genomes even from closely related viruses may have non-homologous regions unsuitable for phylogenetic analysis. Another aspect is the creation of a representative dataset. Sometimes, all stages of the analysis are not indicated in publications, so that the data obtained can be interpreted ambiguously. The correct use of statistical phylogenetics methods in virology is possible only upon understanding their principles, proper methods of data preparation and evolutionary model selection criteria.
Keywords