A Novel Method for Identifying Essential Proteins Based on Non-negative Matrix Tri-Factorization

Zhihong Zhang; Zhihong Zhang; Meiping Jiang; Dongjie Wu; Wang Zhang; Wei Yan; Xilong Qu; Xilong Qu

doi:10.3389/fgene.2021.709660

Frontiers in Genetics (Aug 2021)

A Novel Method for Identifying Essential Proteins Based on Non-negative Matrix Tri-Factorization

Zhihong Zhang,
Zhihong Zhang,
Meiping Jiang,
Dongjie Wu,
Wang Zhang,
Wei Yan,
Xilong Qu,
Xilong Qu

Affiliations

Zhihong Zhang: College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
Zhihong Zhang: School of Information Technology and Management, Hunan University of Finance and Economics, Changsha, China
Meiping Jiang: Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, China
Dongjie Wu: Department of Banking and Finance, Monash University, Clayton, VIC, Australia
Wang Zhang: Department of Optoelectronic Engineering, Jinan University, Guangzhou, China
Wei Yan: College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
Xilong Qu: School of Information Technology and Management, Hunan University of Finance and Economics, Changsha, China
Xilong Qu: Hunan Provincial Key Laboratory of Finance and Economics Big Data Science and Technology, Hunan University of Finance and Economics, Changsha, China

DOI: https://doi.org/10.3389/fgene.2021.709660
Journal volume & issue: Vol. 12

Abstract

Read online

Identification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, there has been an increasing interest in using computational methods to predict essential proteins based on protein–protein interaction (PPI) networks or fusing multiple biological information. However, it has been observed that existing PPI data have false-negative and false-positive data. The fusion of multiple biological information can reduce the influence of false data in PPI, but inevitably more noise data will be produced at the same time. In this article, we proposed a novel non-negative matrix tri-factorization (NMTF)-based model (NTMEP) to predict essential proteins. Firstly, a weighted PPI network is established only using the topology features of the network, so as to avoid more noise. To reduce the influence of false data (existing in PPI network) on performance of identify essential proteins, the NMTF technique, as a widely used recommendation algorithm, is performed to reconstruct a most optimized PPI network with more potential protein–protein interactions. Then, we use the PageRank algorithm to compute the final ranking score of each protein, in which subcellular localization and homologous information of proteins were used to calculate the initial scores. In addition, extensive experiments are performed on the publicly available datasets and the results indicate that our NTMEP model has better performance in predicting essential proteins against the start-of-the-art method. In this investigation, we demonstrated that the introduction of non-negative matrix tri-factorization technology can effectively improve the condition of the protein–protein interaction network, so as to reduce the negative impact of noise on the prediction. At the same time, this finding provides a more novel angle of view for other applications based on protein–protein interaction networks.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords