Algorithms for Molecular Biology (Oct 2009)

Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications

  • Martelli Pier,
  • Savojardo Castrense,
  • Fariselli Piero,
  • Casadio Rita

DOI
https://doi.org/10.1186/1748-7188-4-13
Journal volume & issue
Vol. 4, no. 1
p. 13

Abstract

Read online

Abstract Background Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows regular grammar rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that GRHCRFs can be useful tools for biosequence analysis applications. Availability GRHCRF software is available under GPLv3 licence at the website http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.