Cells (Nov 2024)

GPS-pPLM: A Language Model for Prediction of Prokaryotic Phosphorylation Sites

  • Chi Zhang,
  • Dachao Tang,
  • Cheng Han,
  • Yujie Gou,
  • Miaomiao Chen,
  • Xinhe Huang,
  • Dan Liu,
  • Miaoying Zhao,
  • Leming Xiao,
  • Qiang Xiao,
  • Di Peng,
  • Yu Xue

DOI
https://doi.org/10.3390/cells13221854
Journal volume & issue
Vol. 13, no. 22
p. 1854

Abstract

Read online

In the prokaryotic kingdom, protein phosphorylation serves as one of the most important posttranslational modifications (PTMs) and is involved in orchestrating a broad spectrum of biological processes. Here, we report an updated online server named the group-based prediction system for prokaryotic phosphorylation language model (GPS-pPLM), used for predicting phosphorylation sites (p-sites) in prokaryotes. For model training, two deep learning methods, a transformer and a deep neural network, were employed, and a total of 10 sequence features and contextual features were integrated. Using 44,839 nonredundant p-sites in 16,041 proteins from 95 prokaryotes, two general models for the prediction of O-phosphorylation and N-phosphorylation were first pretrained and then fine-tuned to construct 6 predictors specific for each phosphorylatable residue type as well as 134 species-specific predictors. Compared with other existing tools, the GPS-pPLM exhibits higher accuracy in predicting prokaryotic O-phosphorylation p-sites. Protein sequences in FASTA format or UniProt accession numbers can be submitted by users, and the predicted results are displayed in tabular form. In addition, we annotate the predicted p-sites with knowledge from 22 public resources, including experimental evidence, 3D structures, and disorder tendencies. The online service of the GPS-pPLM is freely accessible for academic research.

Keywords