IEEE Access (Jan 2020)

Disease-Pathway Association Prediction Based on Random Walks With Restart and PageRank

  • Ali Ghulam,
  • Xiujuan Lei,
  • Min Guo,
  • Chen Bian

DOI
https://doi.org/10.1109/ACCESS.2020.2987071
Journal volume & issue
Vol. 8
pp. 72021 – 72038

Abstract

Read online

The study of disease-pathway association in human diseases is a perennial focus of the biomedical field. The association of diseases and pathways can help in the discovery of the mechanisms or relationships of human diseases. The accuracy of disease identification has been less than satisfactory despite decades of research in this area. Therefore, this study proposes a computational model for the prediction of disease-pathway associations. The proposed computational model is based on Random Walk with Restart on heterogeneous network (RWRH) and PageRank. The RWRH disease-pathway association model is a novel computational model that can predict potential disease-pathway associations. Furthermore, the model can help pathologists understand the correlations among disease-pathway associations, treatments, and reactions. We performed a pathway-based study to expand disease variation relationships and to find new molecular correlations between genetic mutations. We constructed a biological network on the basis of shared gene interactions of disease-pathways and attempted to investigate the pathogenesis of a disease by analyzing the constructed network. The network construction was based on two parts. First, the similarity between pathway-pathway networks was calculated. Second, a disease-disease (DD) similarity network was constructed, and the correlation between disease and disease similarity was calculated. We also investigated the pathway seed node and disease seed node with high PageRank. Moreover, we focused on mining the complexity of disease-pathway associations. We used the bipartite network of disease-pathway associations to combine the obtained biological information, which was based on the pair similarity of sequence expression weights. These weights, which were obtained by using the multilayer resource-allocation algorithm, were used to calculate the prediction scores of each disease-pathway pair. Here, through leave-one-out cross-validation, we examined a $210\times1855$ matrix, with the 210 rows representing diseases and 1855 columns indicating pathways. The disease-pathway adjacency matrix contained 13,838 known disease-pathway associations. The best predictive results achieved an area-under-the-curve value of 0.8218 and a two-class precision-recall curve. These results indicate that our method has higher scientific performance than previously proposed methods. We predicted pathogen, DD, and disease-pathway relationships by comparing them with known associations and through publication search. We then proposed the possible reasons for our predictions.

Keywords