Algorithms for Molecular Biology (Jan 2008)

Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability

  • Wu Gang,
  • Kao Ming-Yang,
  • Lin Guohui,
  • You Jia-Huai

DOI
https://doi.org/10.1186/1748-7188-3-1
Journal volume & issue
Vol. 3, no. 1
p. 1

Abstract

Read online

Abstract Background In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability. Results The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in O(n2) time by querying at most (n - 4) log(n - 1) quartet topologies, where n is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - p in O(n4 log n) time, where p is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately 11+q2+12q4+116q5 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIXaqmaeaacqaIXaqmcqGHRaWkcqWGXbqCdaahaaqabeaacqaIYaGmaaGaey4kaSYaaSaaaeaacqaIXaqmaeaacqaIYaGmaaGaemyCae3aaWbaaeqabaGaeGinaqdaaiabgUcaRmaalaaabaGaeGymaedabaGaeGymaeJaeGOnaydaaiabdghaXnaaCaaabeqaaiabiwda1aaaaaaaaa@3D5A@, where q=p1−p MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyCaeNaeyypa0tcfa4aaSaaaeaacqWGWbaCaeaacqaIXaqmcqGHsislcqWGWbaCaaaaaa@3391@, with running time of O(n5), which is at least 0.984 when p Conclusion The three proposed algorithms are mathematically guaranteed to reconstruct the "true" phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.