BMC Bioinformatics (Sep 2017)

Phylo_dCor: distance correlation as a novel metric for phylogenetic profiling

  • Gabriella Sferra,
  • Federica Fratini,
  • Marta Ponzi,
  • Elisabetta Pizzi

DOI
https://doi.org/10.1186/s12859-017-1815-5
Journal volume & issue
Vol. 18, no. 1
pp. 1 – 7

Abstract

Read online

Abstract Background Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Results Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson’s correlation as measures of profile similarity. Conclusions In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.

Keywords