Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures

Anna Carbery; Martin Buttenschoen; Rachael Skyner; Frank von Delft; Charlotte M. Deane

doi:10.1186/s13321-024-00821-4

Journal of Cheminformatics (Mar 2024)

Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures

Anna Carbery,
Martin Buttenschoen,
Rachael Skyner,
Frank von Delft,
Charlotte M. Deane

Affiliations

Anna Carbery: Oxford Protein Informatics Group, Department of Statistics, University of Oxford
Martin Buttenschoen: Oxford Protein Informatics Group, Department of Statistics, University of Oxford
Rachael Skyner: OMass Therapeutics
Frank von Delft: Diamond Light Source, Harwell Science and Innovation Campus
Charlotte M. Deane: Oxford Protein Informatics Group, Department of Statistics, University of Oxford

DOI: https://doi.org/10.1186/s13321-024-00821-4
Journal volume & issue: Vol. 16, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal