Computational and Structural Biotechnology Journal (Jan 2023)
Phosphoproteomics data-driven signalling network inference: Does it work?
Abstract
The advent of global phosphoproteome profiling has led to wide phosphosite coverage and therefore the opportunity to predict kinase-substrate associations from these datasets. However, the regulatory kinase is unknown for most substrates, due to biased and incomplete database annotations. In this study we compare the performance of six pairwise measures to predict kinase-substrate associations using a data driven approach on publicly available time resolved and perturbation mass spectrometry-based phosphoproteome data. First, we validated the performance of these measures using as a reference both a literature-based phosphosite-specific protein interaction network and a predicted kinase–substrate (KS) interactions set. The overall performance in predicting kinase-substrate associations using pairwise measures across both these reference sets was poor. To expand into the wider interactome space, we applied the approach on a network comprising pairs of substrates regulated by the same kinase (substrate-substrate associations) but found the performance to be equally poor. However, the addition of a sequence similarity filter for substrate–substrate associations led to a significant boost in performance. Our findings imply that the use of a filter to reduce the search space, such as a sequence similarity filter, can be used prior to the application of network inference methods to reduce noise and boost the signal. We also find that the current gold standard for reference sets is not adequate for evaluation as it is limited and context-agnostic. Therefore, there is a need for additional evaluation methods that have increased coverage and take into consideration the context-specific nature of kinase–substrate associations.