Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network

Buzhong Zhang; Linqing Li; Qiang Lü

doi:10.3390/biom8020033

Biomolecules (May 2018)

Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network

Buzhong Zhang,
Linqing Li,
Qiang Lü

Affiliations

Buzhong Zhang: School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Linqing Li: School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Qiang Lü: School of Computer Science and Technology, Soochow University, Suzhou 215006, China

DOI: https://doi.org/10.3390/biom8020033
Journal volume & issue: Vol. 8, no. 2
p. 33

Abstract

Read online

Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson’s correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.

Published in Biomolecules

ISSN: 2218-273X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Microbiology
Website: https://www.mdpi.com/journal/biomolecules

About the journal

Abstract

Keywords