BMC Bioinformatics (Feb 2011)
A Computational Framework for Proteome-Wide Pursuit and Prediction of Metalloproteins using ICP-MS and MS/MS Data
Abstract
Abstract Background Metal-containing proteins comprise a diverse and sizable category within the proteomes of organisms, ranging from proteins that use metals to catalyze reactions to proteins in which metals play key structural roles. Unfortunately, reliably predicting that a protein will contain a specific metal from its amino acid sequence is not currently possible. We recently developed a generally-applicable experimental technique for finding metalloproteins on a genome-wide scale. Applying this metal-directed protein purification approach (ICP-MS and MS/MS based) to the prototypical microbe Pyrococcus furiosus conclusively demonstrated the extent and diversity of the uncharacterized portion of microbial metalloproteomes since a majority of the observed metal peaks could not be assigned to known or predicted metalloproteins. However, even using this technique, it is not technically feasible to purify to homogeneity all metalloproteins in an organism. In order to address these limitations and complement the metal-directed protein purification, we developed a computational infrastructure and statistical methodology to aid in the pursuit and identification of novel metalloproteins. Results We demonstrate that our methodology enables predictions of metal-protein interactions using an experimental data set derived from a chromatography fractionation experiment in which 870 proteins and 10 metals were measured over 2,589 fractions. For each of the 10 metals, cobalt, iron, manganese, molybdenum, nickel, lead, tungsten, uranium, vanadium, and zinc, clusters of proteins frequently occurring in metal peaks (of a specific metal) within the fractionation space were defined. This resulted in predictions that there are from 5 undiscovered vanadium- to 13 undiscovered cobalt-containing proteins in Pyrococcus furiosus. Molybdenum and nickel were chosen for additional assessment producing lists of genes predicted to encode metalloproteins or metalloprotein subunits, 22 for nickel including seven from known nickel-proteins, and 20 for molybdenum including two from known molybdo-proteins. The uncharacterized proteins are prime candidates for metal-based purification or recombinant approaches to validate these predictions. Conclusions We conclude that the largely uncharacterized extent of native metalloproteomes can be revealed through analysis of the co-occurrence of metals and proteins across a fractionation space. This can significantly impact our understanding of metallobiochemistry, disease mechanisms, and metal toxicity, with implications for bioremediation, medicine and other fields.