IEEE Access (Jan 2024)

PhosHSGN: Deep Neural Networks Combining Sequence and Protein Spatial Information to Improve Protein Phosphorylation Site Prediction

  • Jiale Lu,
  • Haibin Chen,
  • Ji Qiu

DOI
https://doi.org/10.1109/ACCESS.2024.3427792
Journal volume & issue
Vol. 12
pp. 100611 – 100627

Abstract

Read online

Phosphorylation site prediction is one of the key processes in protein post-transcriptional modification. It is an important research direction in the field of bioinformatics and is of great significance for understanding protein function and signal transduction. Since it is time-consuming and error-prone to perform site determination through experiments, the application of artificial intelligence is very necessary. This study introduces a novel deep neural network named PhosHSGN designed to identify and examine protein post-translational modification (PTM) sites. The model predicts phosphorylation by extracting the local sequence of the protein and incorporating global spatial information. To effectively combine sequence and spatial information for prediction, a graph neural network is introduced with residuals. This network integrates the Alphafold protein structure prediction module to construct a protein residue contact graph. Additionally, a pre-trained protein language model is employed to generate base extraction graph embeddings. Simultaneously, PhosHSGN incorporates a one-dimensional residual network to explore the sequence information of proteins. Experimental data were collected from PhosphoSitePlus, UniProt, GPS 5.0, and Phospho.ELM. Comparing the experimental results of PhosHSGN with Phosidn and other state-of-the-art models on different datasets reveals that PhosHSGN outperforms sequence-based methods in all metrics with a sensitivity of 96.18%, accuracy of 93.72%, and Mcc value of 84.19% on the dataset S/T. On dataset Y, the F1 score was 94.42% and the AUC value was 96.19%.

Keywords