GPS-pPLM: A Language Model for Prediction of Prokaryotic Phosphorylation Sites
Chi Zhang,
Dachao Tang,
Cheng Han,
Yujie Gou,
Miaomiao Chen,
Xinhe Huang,
Dan Liu,
Miaoying Zhao,
Leming Xiao,
Qiang Xiao,
Di Peng,
Yu Xue
Affiliations
Chi Zhang
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Dachao Tang
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Cheng Han
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Yujie Gou
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Miaomiao Chen
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Xinhe Huang
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Dan Liu
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Miaoying Zhao
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Leming Xiao
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Qiang Xiao
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
Di Peng
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Yu Xue
Department of Bioinformatics and Systems Biology, MOE Key Laboratory of Molecular Biophysics, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
In the prokaryotic kingdom, protein phosphorylation serves as one of the most important posttranslational modifications (PTMs) and is involved in orchestrating a broad spectrum of biological processes. Here, we report an updated online server named the group-based prediction system for prokaryotic phosphorylation language model (GPS-pPLM), used for predicting phosphorylation sites (p-sites) in prokaryotes. For model training, two deep learning methods, a transformer and a deep neural network, were employed, and a total of 10 sequence features and contextual features were integrated. Using 44,839 nonredundant p-sites in 16,041 proteins from 95 prokaryotes, two general models for the prediction of O-phosphorylation and N-phosphorylation were first pretrained and then fine-tuned to construct 6 predictors specific for each phosphorylatable residue type as well as 134 species-specific predictors. Compared with other existing tools, the GPS-pPLM exhibits higher accuracy in predicting prokaryotic O-phosphorylation p-sites. Protein sequences in FASTA format or UniProt accession numbers can be submitted by users, and the predicted results are displayed in tabular form. In addition, we annotate the predicted p-sites with knowledge from 22 public resources, including experimental evidence, 3D structures, and disorder tendencies. The online service of the GPS-pPLM is freely accessible for academic research.