BMC Bioinformatics (Sep 2017)
A machine learning approach for predicting methionine oxidation sites
Abstract
Abstract Background The oxidation of protein-bound methionine to form methionine sulfoxide, has traditionally been regarded as an oxidative damage. However, recent evidences support the view of this reversible reaction as a regulatory post-translational modification. The perception that methionine sulfoxidation may provide a mechanism to the redox regulation of a wide range of cellular processes, has stimulated some proteomic studies. However, these experimental approaches are expensive and time-consuming. Therefore, computational methods designed to predict methionine oxidation sites are an attractive alternative. As a first approach to this matter, we have developed models based on random forests, support vector machines and neural networks, aimed at accurate prediction of sites of methionine oxidation. Results Starting from published proteomic data regarding oxidized methionines, we created a hand-curated dataset formed by 113 unique polypeptides of known structure, containing 975 methionyl residues, 122 of which were oxidation-prone (positive dataset) and 853 were oxidation-resistant (negative dataset). We use a machine learning approach to generate predictive models from these datasets. Among the multiple features used in the classification task, some of them contributed substantially to the performance of the predictive models. Thus, (i) the solvent accessible area of the methionine residue, (ii) the number of residues between the analyzed methionine and the next methionine found towards the N-terminus and (iii) the spatial distance between the atom of sulfur from the analyzed methionine and the closest aromatic residue, were among the most relevant features. Compared to the other classifiers we also evaluated, random forests provided the best performance, with accuracy, sensitivity and specificity of 0.7468±0.0567, 0.6817±0.0982 and 0.7557±0.0721, respectively (mean ± standard deviation). Conclusions We present the first predictive models aimed to computationally detect methionine sites that may become oxidized in vivo in response to oxidative signals. These models provide insights into the structural context in which a methionine residue become either oxidation-resistant or oxidation-prone. Furthermore, these models should be useful in prioritizing methinonyl residues for further studies to determine their potential as regulatory post-translational modification sites.
Keywords