IEEE Access (Jan 2019)

DeepGly: A Deep Learning Framework With Recurrent and Convolutional Neural Networks to Identify Protein Glycation Sites From Imbalanced Data

  • Jingui Chen,
  • Runtao Yang,
  • Chengjin Zhang,
  • Lina Zhang,
  • Qian Zhang

DOI
https://doi.org/10.1109/ACCESS.2019.2944411
Journal volume & issue
Vol. 7
pp. 142368 – 142378

Abstract

Read online

As an unavoidable non-enzymatic reaction between proteins and reducing sugars, glycation can decline antioxidant defense mechanisms, damage cellular organelles, and form advanced glycation end products (AGEs), thereby resulting in a series of destructive physiological diseases. Identification and analysis of protein glycation sites will be beneficial to understand the complex pathogenesis related to the glycation. In this paper, a new glycation site predictor, DeepGly, is proposed based on a deep learning framework with a recurrent neural network (RNN) and a convolutional neural network (CNN). Firstly, for the class imbalance problem in the benchmark dataset, Long Short-Term Memory (LSTM) RNNs are designed to generate artificial peptides with glycation sites to form a balanced dataset. Then, the peptides in the balanced dataset are cleaved into a series of biological words, and continuous distribution representation is employed to transform the biological words into digital vectors. Finally, the digital vectors are input into the CNN with participations of the plurality and multiple convolution kernels to automatically extract various features, pooling layers to perform feature selection, and a softmax function to classify peptides. On the same datasets using 10-fold cross validation test, the prediction performance of DeepGly is far superior to that of existing methods, which indicates that the proposed method can be used as an ideal choice for protein glycation site prediction and also has a certain promotion effect on other related fields.

Keywords