Frontiers in Microbiology (Feb 2015)
Techniques for transferring host-pathogen protein interactions knowledge to new tasks
Abstract
We consider the problem of building a model to predict protein-protein interactions (PPIs) between the bacterial species Salmonella Typhimurium and the plant host Arabidopsis thaliana which is a host-pathogen pair for which no known PPIs are available. To achieve this, we present approaches, which use homology and statistical learning methods called `transfer learning'.In the transfer learning setting, the prediction of PPIs between Arabidopsis and its pathogen S. Typhimurium is called the `target task'. The approaches utilize labeled data i.e known PPIs of other host-pathogen pairs (we call these PPIs the `source tasks'). The homology based approaches use heuristics based on biological intuition to predict PPIs. The transfer learning methods use the similarity of the PPIs from the source tasks to the target task to build an effective model. For a quantitative evaluation we consider Salmonella-mouse PPI prediction and some other host-pathogen tasks where known PPIs exist. We use metrics such as precision and recall and our results show that our methods perform well on the target task in various transfer settings. We present a brief qualitative analysis of the Arabidopsis-Salmonella predicted interactions. We filter the predictions from all approaches using Gene Ontology term enrichment and only those interactions involving Salmonella effectors. Thereby we observe that Arabidopsis proteins involved e.g. in transcriptional regulation, hormone mediated signaling and defense response may be affected by Salmonella.
Keywords