PTG-PLM: Predicting Post-Translational Glycosylation and Glycation Sites Using Protein Language Models and Deep Learning

Alhasan Alkuhlani; Walaa Gad; Mohamed Roushdy; Michael Gr. Voskoglou; Abdel-badeeh M. Salem

doi:10.3390/axioms11090469

Axioms (Sep 2022)

PTG-PLM: Predicting Post-Translational Glycosylation and Glycation Sites Using Protein Language Models and Deep Learning

Alhasan Alkuhlani,
Walaa Gad,
Mohamed Roushdy,
Michael Gr. Voskoglou,
Abdel-badeeh M. Salem

Affiliations

Alhasan Alkuhlani: Faculty of Computer and Information Technology, Sana’a University, Sana’a 1247, Yemen
Walaa Gad: Faculty of Computer and Information Science, Ain Shams University, Cairo 11566, Egypt
Mohamed Roushdy: Faculty of Computers and Information Technology, Future University in Egypt, New Cairo 11835, Egypt
Michael Gr. Voskoglou: Department of Applied Mathematics, Graduate Technological Educational Institute of Western Greece, 22334 Patras, Greece
Abdel-badeeh M. Salem: Faculty of Computer and Information Science, Ain Shams University, Cairo 11566, Egypt

DOI: https://doi.org/10.3390/axioms11090469
Journal volume & issue: Vol. 11, no. 9
p. 469

Abstract

Read online

Post-translational glycosylation and glycation are common types of protein post-translational modifications (PTMs) in which glycan binds to protein enzymatically or nonenzymatically, respectively. They are associated with various diseases such as coronavirus, Alzheimer’s, cancer, and diabetes diseases. Identifying glycosylation and glycation sites is significant to understanding their biological mechanisms. However, utilizing experimental laboratory tools to identify PTM sites is time-consuming and costly. In contrast, computational methods based on machine learning are becoming increasingly essential for PTM site prediction due to their higher performance and lower cost. In recent years, advances in Transformer-based Language Models based on deep learning have been transferred from Natural Language Processing (NLP) into the proteomics field by developing language models for protein sequence representation known as Protein Language Models (PLMs). In this work, we proposed a novel method, PTG-PLM, for improving the performance of PTM glycosylation and glycation site prediction. PTG-PLM is based on convolutional neural networks (CNNs) and embedding extracted from six recent PLMs including ProtBert-BFD, ProtBert, ProtAlbert, ProtXlnet, ESM-1b, and TAPE. The model is trained and evaluated on two public datasets for glycosylation and glycation site prediction. The results show that PTG-PLM based on ESM-1b and ProtBert-BFD has better performance than PTG-PLM based on the other PLMs. Comparison results with the existing tools and representative supervised learning methods show that PTG-PLM surpasses the other models for glycosylation and glycation site prediction. The outstanding performance results of PTG-PLM indicate that it can be used to predict the sites of the other types of PTMs.

Published in Axioms

ISSN: 2075-1680 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/axioms

About the journal

Abstract

Keywords