HybridGCN for protein solubility prediction with adaptive weighting of multiple features

Long Chen; Rining Wu; Feixiang Zhou; Huifeng Zhang; Jian K. Liu

doi:10.1186/s13321-023-00788-8

Journal of Cheminformatics (Dec 2023)

HybridGCN for protein solubility prediction with adaptive weighting of multiple features

Long Chen,
Rining Wu,
Feixiang Zhou,
Huifeng Zhang,
Jian K. Liu

Affiliations

Long Chen: Readline Intelligence
Rining Wu: Readline Intelligence
Feixiang Zhou: Readline Intelligence
Huifeng Zhang: Readline Intelligence
Jian K. Liu: School of Computer Science, University of Birmingham

DOI: https://doi.org/10.1186/s13321-023-00788-8
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 13

Abstract

Read online

Abstract The solubility of proteins stands as a pivotal factor in the realm of pharmaceutical research and production. Addressing the imperative to enhance production efficiency and curtail experimental costs, the demand arises for computational models adept at accurately predicting solubility based on provided datasets. Prior investigations have leveraged deep learning models and feature engineering techniques to distill features from raw protein sequences for solubility prediction. However, these methodologies have not thoroughly delved into the interdependencies among features or their respective magnitudes of significance. This study introduces HybridGCN, a pioneering Hybrid Graph Convolutional Network that elevates solubility prediction accuracy through the combination of diverse features, encompassing sophisticated deep-learning features and classical biophysical features. An exploration into the intricate interplay between deep-learning features and biophysical features revealed that specific biophysical attributes, notably evolutionary features, complement features extracted by advanced deep-learning models. Augmenting the model’s capability for feature representation, we employed ESM, a substantial protein language model, to derive a zero-shot learning feature capturing comprehensive and pertinent information concerning protein functions and structures. Furthermore, we proposed a novel feature fusion module termed Adaptive Feature Re-weighting (AFR) to integrate multiple features, thereby enabling the fine-tuning of feature importance. Ablation experiments and comparative analyses attest to the efficacy of the HybridGCN approach, culminating in state-of-the-art performances on the public eSOL and S. cerevisiae datasets.

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords