Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks

Yiyu Hong; Junsu Ha; Jaemin Sim; Chae Jo Lim; Kwang-Seok Oh; Ramakrishnan Chandrasekaran; Bomin Kim; Jieun Choi; Junsu Ko; Woong-Hee Shin; Juyong Lee

doi:10.1186/s13321-024-00912-2

Journal of Cheminformatics (Nov 2024)

Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks

Yiyu Hong,
Junsu Ha,
Jaemin Sim,
Chae Jo Lim,
Kwang-Seok Oh,
Ramakrishnan Chandrasekaran,
Bomin Kim,
Jieun Choi,
Junsu Ko,
Woong-Hee Shin,
Juyong Lee

Affiliations

Yiyu Hong: Arontier Co.
Junsu Ha: Arontier Co.
Jaemin Sim: Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University
Chae Jo Lim: Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology
Kwang-Seok Oh: Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology
Ramakrishnan Chandrasekaran: Arontier Co.
Bomin Kim: College of Pharmacy, Seoul National University
Jieun Choi: College of Pharmacy, Seoul National University
Junsu Ko: Arontier Co.
Woong-Hee Shin: Arontier Co.
Juyong Lee: Arontier Co.

DOI: https://doi.org/10.1186/s13321-024-00912-2
Journal volume & issue: Vol. 16, no. 1
pp. 1 – 15

Abstract

Read online

Abstract We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules, and the scarcity of crystallographic data for protein–ligand complexes. To overcome the limitations of existing machine learning-based prediction models, we propose a novel approach that fuses three independent neural network models. One classification model is designed to perform binary prediction of a given protein–ligand complex pose. The other two regression models are trained to predict the binding affinity and root-mean-square deviation of a ligand conformation from an input complex structure. We trained the model to account for both deviations in experimental and predicted binding affinities and pose prediction uncertainties. By effectively integrating the outputs of the triplet neural networks with a physics-based scoring function, our model showed a significantly improved performance in hit identification. The benchmark results with three independent decoy sets demonstrate that our model outperformed existing models in forward screening. Our model achieved top 1% enrichment factors of 32.7 and 23.1 with the CASF2016 and DUD-E benchmark sets, respectively. The benchmark results using the LIT-PCBA set further confirmed its higher average enrichment factors, emphasizing the model’s efficiency and generalizability. The model’s efficiency was further validated by identifying 23 active compounds from 63 candidates in experimental screening for autotaxin inhibitors, demonstrating its practical applicability in hit discovery. Scientific contribution Our work introduces a novel training strategy for a protein–ligand binding affinity prediction model by integrating the outputs of three independent sub-models and utilizing expertly crafted decoy sets. The model showcases exceptional performance across multiple benchmarks. The high enrichment factors in the LIT-PCBA benchmark demonstrate its potential to accelerate hit discovery.

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords