PLoS ONE (Jan 2014)
Building a hierarchical organization of protein complexes out of protein association data.
Abstract
Organizing experimentally determined protein associations as a hierarchy can be a good approach to elucidating the content of protein complexes and the modularity of subcomplexes. Several challenges exist. First, intrinsically sticky proteins, such as chaperones, are often falsely assigned to many functionally unrelated complexes. Second, the reported collections of proteins may not be true "complexes" in the sense that they bind together and perform a joint cellular function. Third, due to imperfect sensitivity of protein detection methods, both false positive and false negative assignments of a protein to complexes may occur. We mitigate the first issue by down-weighting sticky proteins by their occurrence frequencies. We approach the other two problems by merging nearly identical complexes and by constructing a directed acyclic graph (DAG) based on the relationship of partial inclusion. The constructed DAG, within which smaller complexes form parts of the larger, can reveal how different complexes are joined. By merging almost identical complexes one can deemphasize the influence of false positives, while allowing false negatives to be rescued by other nearly identical association data. We investigate several protein weighting schemes and compare their corresponding DAGs using yeast and human complexes. We find that the scheme incorporating weights based on information flow in the network of direct protein-protein interactions produces biologically most meaningful DAGs. In either yeast or human, isolated nodes form a large proportion of the final hierarchy. While most connected components encompass very few nodes, the largest one for each species contains a sizable portion of all nodes. By considering examples of subgraphs composed of nodes containing a specified protein, we illustrate that the graphs' topological features can correctly suggest the biological roles of protein complexes. The input data, final results and the source code are available at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/ProteinComplexDAG/.