Human Genomics (May 2021)
An application of slow feature analysis to the genetic sequences of coronaviruses and influenza viruses
Abstract
Abstract Background Mathematical approaches have been for decades used to probe the structure of DNA sequences. This has led to the development of Bioinformatics. In this exploratory work, a novel mathematical method is applied to probe the DNA structure of two related viral families: those of coronaviruses and those of influenza viruses. The coronaviruses are SARS-CoV-2, SARS-CoV-1, and MERS. The influenza viruses include H1N1-1918, H1N1-2009, H2N2-1957, and H3N2-1968. Methods The mathematical method used is the slow feature analysis (SFA), a rather new but promising method to delineate complex structure in DNA sequences. Results The analysis indicates that the DNA sequences exhibit an elaborate and convoluted structure akin to complex networks. We define a measure of complexity and show that each DNA sequence exhibits a certain degree of complexity within itself, while at the same time there exists complex inter-relationships between the sequences within a family and between the two families. From these relationships, we find evidence, especially for the coronavirus family, that increasing complexity in a sequence is associated with higher transmission rate but with lower mortality. Conclusions The complexity measure defined here may hold a promise and could become a useful tool in the prediction of transmission and mortality rates in future new viral strains.
Keywords