Mathematical Biosciences and Engineering (Jan 2022)
UPFPSR: a ubiquitylation predictor for plant through combining sequence information and random forest
Abstract
As one of the most significant protein post-translational modifications (PTMs) in eukaryotes, ubiquitylation plays an essential role in regulating diverse cellular functions, such as apoptosis, cell division, DNA repair and replication, intracellular transport and immune reactions. Traditional experimental methods have the defect of being time-consuming, costly and labor-intensive. Therefore, it is highly desired to develop automated computational methods that can recognize potential ubiquitylation sites rapidly and accurately. In this study, we propose a novel predictor, named UPFPSR, for predicting lysine ubiquitylation sites in plant. UPFPSR is developed using multiple physicochemical properties of amino acids and sequence-based statistical information. In order to find a suitable classification algorithm, four traditional algorithms and two deep learning networks are compared, and the random forest with superior performance is selected ultimately. An extensive benchmarking shows that UPFPSR outperforms the most advanced ubiquitylation prediction tool on each measurement indicator, with the accuracy of 77.3%, precision of 75%, recall of 81.7%, F1-score of 0.7824, and AUC of 0.84 on the independent test dataset. The results indicate that UPFPSR can provide new guidance for further experimental study on ubiquitylation. The data sets and source code used in this study are freely available at https://github.com/ysw-sunshine/UPFPSR.
Keywords