BMC Bioinformatics (Nov 2011)

Correlated mutations via regularized multinomial regression

  • Sreekumar Janardanan,
  • ter Braak Cajo JF,
  • van Ham Roeland CHJ,
  • van Dijk Aalt DJ

DOI
https://doi.org/10.1186/1471-2105-12-444
Journal volume & issue
Vol. 12, no. 1
p. 444

Abstract

Read online

Abstract Background In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies. Results We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions. Conclusions A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments. Availability R-code of our implementation is available via http://www.ab.wur.nl/rmrcm