Computational and Structural Biotechnology Journal (Dec 2024)
Protein allosteric site identification using machine learning and per amino acid residue reported internal protein nanoenvironment descriptors
Abstract
Allosteric regulation plays a crucial role in modulating protein functions and represents a promising strategy in drug development, offering enhanced specificity and reduced toxicity compared to traditional active site inhibition. Existing computational methods for predicting allosteric sites on proteins often rely on static protein surface pocket features, normal mode analysis or extensive molecular dynamics simulations encompassing both the protein function modulator and the protein itself. In this study, we introduce an innovative methodology that employs a per amino acid residue classifier to distinguish allosteric site-forming residues (AFRs) from non-allosteric, or free residues (FRs). Our model, STINGAllo, exhibits robust performance, achieving Distance Center Center (DCC) success rate when all AFRs were predicted within pockets identified by FPocket, overall DCC, F1 score and a Matthews correlation coefficient (MCC) of 78 %, 60 %, 64 % and 64 % respectively. Furthermore, we identified key descriptors that characterize the internal protein nanoenvironment of AFRs, setting them apart from FRs. These descriptors include the sponge effect, distance to the protein centre of geometry (cg), hydrophobic interactions, electrostatic potentials, eccentricity, and graph bottleneck features.