Identifying Protein Complexes in Protein-Protein Interaction Data Using Graph Convolutional Network

Nazar Zaki; Harsh Singh; Elfadil A. Mohamed

doi:10.1109/ACCESS.2021.3110845

IEEE Access (Jan 2021)

Identifying Protein Complexes in Protein-Protein Interaction Data Using Graph Convolutional Network

Nazar Zaki,
Harsh Singh,
Elfadil A. Mohamed

Affiliations

Nazar Zaki: ORCiD; Big Data Analytics Center (BIDAC), United Arab Emirates University (UAEU), Al Ain, United Arab Emirates
Harsh Singh: ORCiD; Big Data Analytics Center (BIDAC), United Arab Emirates University (UAEU), Al Ain, United Arab Emirates
Elfadil A. Mohamed: ORCiD; College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates

DOI: https://doi.org/10.1109/ACCESS.2021.3110845
Journal volume & issue: Vol. 9
pp. 123717 – 123726

Abstract

Read online

Protein complexes are groups of two or more polypeptide chains that bind to form noncovalent networks of protein interactions. Over the past decade, researchers have created a number of means of computing the ways in which protein complexes and their members can be identified through these interaction networks. Although most of the existing methods identify protein functional complexes from the protein-protein interaction networks (PPIs) at a fairly decent level, the applicability of advanced graph network methods has not yet been adequately investigated. This paper proposes various graph convolutional network (GCN) methods to improve the detection of protein complexes. We first formulate the protein complex detection problem as a node classification problem. Then, we developed a Neural Overlapping Community Detection (NOCD) model to cluster the nodes (proteins) using a complex affiliation matrix. A representation learning approach, that combines a multi-class GCN feature extractor (to obtain the nodes’ features) and a mean shift clustering algorithm (to perform the clustering), is also utilized. We convert the dense-dense matrix operations into dense-sparse or sparse-sparse matrix operations to improve the efficiency of the multi-class GCN network by reducing space and time complexities. The proposed solution significantly improves the scalability of the existing GCN. Finally, we apply clustering aggregation to find the best protein complexes. A grid search is then performed on various detected complexes obtained via three well-known protein detection methods, namely ClusterONE, CMC, and PEWCC, with the help of the Meta-Clustering Algorithm (MCLA) and the Hybrid Bipartite Graph Formulation (HBGF). We test the proposed GCN-based methods on various publicly available datasets and find that they perform significantly better than previous state-of-the-art methods. The code/data are available for free download from https://github.com/Analystharsh/GCN_complex_detection.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords