Machine Learning: Science and Technology (Jan 2024)

SICGNN: structurally informed convolutional graph neural networks for protein classification

  • YongHyun Lee,
  • Eunchan Kim,
  • Jiwoong Choi,
  • Changhyun Lee

DOI
https://doi.org/10.1088/2632-2153/ad979b
Journal volume & issue
Vol. 5, no. 4
p. 045072

Abstract

Read online

Recently, graph neural networks (GNNs) have been widely used in various domains, including social networks, recommender systems, protein classification, molecular property prediction, and genetic networks. In bioinformatics and chemical engineering, considerable research is being actively conducted to represent molecules or proteins on graphs by conceptualizing atoms or amino acids as nodes and the relationships between nodes as edges. The overall structures of proteins and their interconnections are crucial for predicting and classifying their properties. However, as GNNs stack more layers to create deeper networks, the embeddings between nodes may become excessively similar, causing an oversmoothing problem that reduces the performance for downstream tasks. To avoid this, GNNs typically use a limited number of layers, which leads to the problem of reflecting only the local structure and neighborhood information rather than the global structure of the graph. Therefore, we propose a structurally informed convolutional GNN (SICGNN) that utilizes information that can express the overall topological structure of a protein graph during GNN training and prediction. By explicitly including information of the entire graph topology, the proposed model can utilize both local neighborhood and global structural information. We applied the SICGNN to representative GNNs such as GraphSAGE, graph isomorphism network, and graph attention network, and confirmed performance improvements across various datasets. We also demonstrate the robustness of SICGNN using multiple stratified 10-fold cross-validations and various hyperparameter settings, and demonstrate that its accuracy is comparable or better than those of existing GNN models.

Keywords