BMC Bioinformatics (Dec 2012)

Learning virulent proteins from integrated query networks

  • Cadag Eithon,
  • Tarczy-Hornoch Peter,
  • Myler Peter J

DOI
https://doi.org/10.1186/1471-2105-13-321
Journal volume & issue
Vol. 13, no. 1
p. 321

Abstract

Read online

Abstract Background Methods of weakening and attenuating pathogens’ abilities to infect and propagate in a host, thus allowing the natural immune system to more easily decimate invaders, have gained attention as alternatives to broad-spectrum targeting approaches. The following work describes a technique to identifying proteins involved in virulence by relying on latent information computationally gathered across biological repositories, applicable to both generic and specific virulence categories. Results A lightweight method for data integration is used, which links information regarding a protein via a path-based query graph. A method of weighting is then applied to query graphs that can serve as input to various statistical classification methods for discrimination, and the combined usage of both data integration and learning methods are tested against the problem of both generalized and specific virulence function prediction. Conclusions This approach improves coverage of functional data over a protein. Moreover, while depending largely on noisy and potentially non-curated data from public sources, we find it outperforms other techniques to identification of general virulence factors and baseline remote homology detection methods for specific virulence categories.