Physical Review Research (Sep 2021)
Understanding protein-complex assembly through grand canonical maximum entropy modeling
Abstract
Inside a cell, heterotypic proteins assemble in inhomogeneous, crowded systems where the abundance of these proteins vary with cell types. While some protein complexes form putative structures that can be visualized with imaging, there are far more protein complexes that are yet to be solved because of their dynamic associations with one another. Nevertheless, it is possible to infer these protein complexes through a physical model. However, it is often not clear to physicists what kind of data from biology is necessary for such a modeling endeavor. Here, we aim to model these clusters of coarse-grained protein assemblies from multiple subunits through the constraints of interactions among the subunits and the chemical potential of each subunit. We obtained the constraints on the interactions among subunits from the known protein structures. We inferred the chemical potential that dictates the particle number distribution of each protein subunit from the knowledge of protein abundance from experimental data. Guided by the maximum entropy principle, we formulated an inverse statistical mechanical method to infer the distribution of particle numbers from the data of protein abundance as chemical potentials for a grand canonical multicomponent mixture. Using grand canonical Monte Carlo simulations, we captured a distribution of high-order clusters in a protein complex of succinate dehydrogenase with four known subunits. The complexity of hierarchical clusters varies with the relative protein abundance of each subunit in distinctive cell types such as lung, heart, and brain. When the crowding content increases, we observed that crowding stabilizes emergent clusters that do not exist in dilute conditions. We, therefore, proposed a testable hypothesis that the hierarchical complexity of protein clusters on a molecular scale is a plausible biomarker of predicting the phenotypes of a cell.