Computational and Structural Biotechnology Journal (Dec 2024)
Beyond mutations: Accounting for quantitative changes in the analysis of protein evolution
Abstract
Molecular phylogenetic research has relied on the analysis of the coding sequences by genes or of the amino acid sequences by the encoded proteins. Enumerating the numbers of mismatches, being indicators of mutation, has been central to pertinent algorithms. Specific amino acids possess quantifiable characteristics that enable the conversion from “words” (strings of letters denoting amino acids or bases) to “waves” (strings of quantitative values representing the physico-chemical properties) or to matrices (coordinates representing the positions in a comprehensive property space). The application of such numerical representations to evolutionary analysis takes into account not only the occurrence of mutations but also their properties as influences that drive speciation, because selective pressures favor certain mutations over others, and this predilection is represented in the characteristics of the incorporated amino acids (it is not born out solely by the mismatches). Besides being more discriminating sources for tree-generating algorithms than match/mismatch, the number strings can be examined for overall similarity with average mutual information, autocorrelation, and fractal dimension. Bivariate wavelet analysis aids in distinguishing hypermutable versus conserved domains of the protein. The matrix depiction is readily subjected to comparisons of distances, and it allows the generation of heat maps or graphs. This analysis preserves the accepted taxa order where tree construction with standard approaches yields conflicting results (for the protein S100A6). It also aids hypothesis generation about the origin of mitochondrial proteins. These analytical algorithms have been automated in R and are applicable to various processes that are describable in matrix format.