PLoS Computational Biology (Feb 2021)

Bayesian parameter estimation for automatic annotation of gene functions using observational data and phylogenetic trees.

  • George G Vega Yon,
  • Duncan C Thomas,
  • John Morrison,
  • Huaiyu Mi,
  • Paul D Thomas,
  • Paul Marjoram

DOI
https://doi.org/10.1371/journal.pcbi.1007948
Journal volume & issue
Vol. 17, no. 2
p. e1007948

Abstract

Read online

Gene function annotation is important for a variety of downstream analyses of genetic data. But experimental characterization of function remains costly and slow, making computational prediction an important endeavor. Phylogenetic approaches to prediction have been developed, but implementation of a practical Bayesian framework for parameter estimation remains an outstanding challenge. We have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out cross-validation, and we further validated some of the predictions in the experimental scientific literature.