IEEE Access (Jan 2020)

Deep Neighbor Information Learning From Evolution Trees for Phylogenetic Likelihood Estimates

  • Cheng Ling,
  • Wenhao Cheng,
  • Haoyu Zhang,
  • Hanhao Zhu,
  • Hua Zhang

DOI
https://doi.org/10.1109/ACCESS.2020.3043150
Journal volume & issue
Vol. 8
pp. 220692 – 220702

Abstract

Read online

Likelihood probability based phylogenetic analysis approaches have contributed to impressive advances in minimizing the variance of estimating the evolutionary parameters. However, their actual applications are greatly limited due to the very time-consuming calculations of Conditional Likelihood Probabilities (CLPs). Accurately and quickly obtaining the likelihoods of massive tree samples can facilitate phylogenetic analysis process. Inspired by recent advance of machine learning techniques that greatly improve the performance of many related prediction tasks, this study proposes a Random Forest (RF) based learning and prediction approach, called NeoPLE. The approach initially learns the deep neighbor information between nodes from the topology representations of evolution trees, integrates likelihood information from these trees, and trains a non-linear prediction model. Instead of having to depend on the recursive calculations of the CLPs of tree nodes, NeoPLE transfers the process to a prediction by the trained model, thus the likelihood estimates become irrelevant with the calculations of CLPs. In terms of performance improvement, speedup factors ranging from 2.1 to 3.5X can be achieved on the analysis of realistic data sets. Moreover, NeoPLE is very suitable to handle the data sets having relatively large number of alignment sites, the factor of up to 27.5X can be achieved on the analysis of simulated data sets. In addition, NeoPLE is robust against a wide range of choices of evolutionary models and is ready to integrate in more phylogenetic inference tools. This study fills in the gaps of phylogenetic analysis using a machine learning approach in feature representation and likelihood prediction of evolution trees, which has not been reported in literatures.

Keywords