BMC Genomics (Dec 2020)

Using earth mover’s distance for viral outbreak investigations

  • Andrew Melnyk,
  • Sergey Knyazev,
  • Fredrik Vannberg,
  • Leonid Bunimovich,
  • Pavel Skums,
  • Alex Zelikovsky

DOI
https://doi.org/10.1186/s12864-020-06982-4
Journal volume & issue
Vol. 21, no. S5
pp. 1 – 9

Abstract

Read online

Abstract Background RNA viruses mutate at extremely high rates, forming an intra-host viral population of closely related variants, which allows them to evade the host’s immune system and makes them particularly dangerous. Viral outbreaks pose a significant threat for public health, and, in order to deal with it, it is critical to infer transmission clusters, i.e., decide whether two viral samples belong to the same outbreak. Next-generation sequencing (NGS) can significantly help in tackling outbreak-related problems. While NGS data is first obtained as short reads, existing methods rely on assembled sequences. This requires reconstruction of the entire viral population, which is complicated, error-prone and time-consuming. Results The experimental validation using sequencing data from HCV outbreaks shows that the proposed algorithm can successfully identify genetic relatedness between viral populations, infer transmission direction, transmission clusters and outbreak sources, as well as decide whether the source is present in the sequenced outbreak sample and identify it. Conclusions Introduced algorithm allows to cluster genetically related samples, infer transmission directions and predict sources of outbreaks. Validation on experimental data demonstrated that algorithm is able to reconstruct various transmission characteristics. Advantage of the method is the ability to bypass cumbersome read assembly, thus eliminating the chance to introduce new errors, and saving processing time by allowing to use raw NGS reads.

Keywords