Embedding gene trees into phylogenetic networks by conflict resolution algorithms

Marcin Wawerka; Dawid Dąbkowski; Natalia Rutecka; Agnieszka Mykowiecka; Paweł Górecki

doi:10.1186/s13015-022-00218-8

Algorithms for Molecular Biology (May 2022)

Embedding gene trees into phylogenetic networks by conflict resolution algorithms

Marcin Wawerka,
Dawid Dąbkowski,
Natalia Rutecka,
Agnieszka Mykowiecka,
Paweł Górecki

Affiliations

Marcin Wawerka: University of Warsaw, Faculty of Mathematics, Informatics and Mechanics
Dawid Dąbkowski: University of Warsaw, Faculty of Mathematics, Informatics and Mechanics
Natalia Rutecka: University of Warsaw, Faculty of Mathematics, Informatics and Mechanics
Agnieszka Mykowiecka: University of Warsaw, Faculty of Mathematics, Informatics and Mechanics
Paweł Górecki: University of Warsaw, Faculty of Mathematics, Informatics and Mechanics

DOI: https://doi.org/10.1186/s13015-022-00218-8
Journal volume & issue: Vol. 17, no. 1
pp. 1 – 23

Abstract

Read online

Abstract Background Phylogenetic networks are mathematical models of evolutionary processes involving reticulate events such as hybridization, recombination, or horizontal gene transfer. One of the crucial notions in phylogenetic network modelling is displayed tree, which is obtained from a network by removing a set of reticulation edges. Displayed trees may represent an evolutionary history of a gene family if the evolution is shaped by reticulation events. Results We address the problem of inferring an optimal tree displayed by a network, given a gene tree G and a tree-child network N, under the deep coalescence and duplication costs. We propose an O(mn)-time dynamic programming algorithm (DP) to compute a lower bound of the optimal displayed tree cost, where m and n are the sizes of G and N, respectively. In addition, our algorithm can verify whether the solution is exact. Moreover, it provides a set of reticulation edges corresponding to the obtained cost. If the cost is exact, the set induces an optimal displayed tree. Otherwise, the set contains pairs of conflicting edges, i.e., edges sharing a reticulation node. Next, we show a conflict resolution algorithm that requires $$2^{r+1}-1$$ 2 r + 1 - 1 invocations of DP in the worst case, where r is the number of reticulations. We propose a similar $$O(2^kmn)$$ O ( 2 k m n ) -time algorithm for level-k tree-child networks and a branch and bound solution to compute lower and upper bounds of optimal costs. We also extend the algorithms to a broader class of phylogenetic networks. Based on simulated data, the average runtime is $$\Theta (2^{{0.543}k}mn)$$ Θ ( 2 0.543 k m n ) under the deep-coalescence cost and $$\Theta (2^{{0.355}k}mn)$$ Θ ( 2 0.355 k m n ) under the duplication cost. Conclusions Despite exponential complexity in the worst case, our algorithms perform significantly well on empirical and simulated datasets, due to the strategy of resolving internal dissimilarities between gene trees and networks. Therefore, the algorithms are efficient alternatives to enumeration strategies commonly proposed in the literature and enable analyses of complex networks with dozens of reticulations.

Published in Algorithms for Molecular Biology

ISSN: 1748-7188 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: http://almob.biomedcentral.com

About the journal

Abstract

Keywords