Journal of Cheminformatics (Mar 2021)

SMPLIP-Score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors

  • Surendra Kumar,
  • Mi-hyun Kim

DOI
https://doi.org/10.1186/s13321-021-00507-1
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 17

Abstract

Read online

Abstract In drug discovery, rapid and accurate prediction of protein–ligand binding affinities is a pivotal task for lead optimization with acceptable on-target potency as well as pharmacological efficacy. Furthermore, researchers hope for a high correlation between docking score and pose with key interactive residues, although scoring functions as free energy surrogates of protein–ligand complexes have failed to provide collinearity. Recently, various machine learning or deep learning methods have been proposed to overcome the drawbacks of scoring functions. Despite being highly accurate, their featurization process is complex and the meaning of the embedded features cannot directly be interpreted by human recognition without an additional feature analysis. Here, we propose SMPLIP-Score (Substructural Molecular and Protein–Ligand Interaction Pattern Score), a direct interpretable predictor of absolute binding affinity. Our simple featurization embeds the interaction fingerprint pattern on the ligand-binding site environment and molecular fragments of ligands into an input vectorized matrix for learning layers (random forest or deep neural network). Despite their less complex features than other state-of-the-art models, SMPLIP-Score achieved comparable performance, a Pearson’s correlation coefficient up to 0.80, and a root mean square error up to 1.18 in pK units with several benchmark datasets (PDBbind v.2015, Astex Diverse Set, CSAR NRC HiQ, FEP, PDBbind NMR, and CASF-2016). For this model, generality, predictive power, ranking power, and robustness were examined using direct interpretation of feature matrices for specific targets.

Keywords