Journal of Cell and Molecular Research (Dec 2016)
Unravelling Over-Represented Amino Acids in Protein Structure of Allergen Proteins; A Large-Scale Study
Abstract
Allergens are proteins or glycoproteins which make widespread disorders that can lead to a systemic anaphylactic shock and even death within a short period of time. Understanding the protein features that are involved in allergenicity is important in developing future treatments as well as engineering proteins in genetic transformation projects. A big dataset of 1439 protein features from 761 plant allergens and 7815 non-allergen proteins was constructed. Thereafter, 10 different attribute weighting algorithms were utilized to find the key characteristics differentiating allergens and non-allergen proteins. The frequency of Leu, Arg and Gln selected by different attribute weighting algorithms with more than 50% confidence, including attribute weighting by Weight_Info Gain, Weight Chi Squared, Weight_Gini Index and Weight_Relief. High amount of Gln and low percentage of Leu and Arg discriminate plant allergens from non-allergens
Keywords