Artificial Intelligence in the Life Sciences (Dec 2024)

Machine learning proteochemometric models for Cereblon glue activity predictions

  • Francis J. Prael, III,
  • Jiayi Cox,
  • Noé Sturm,
  • Peter Kutchukian,
  • William C. Forrester,
  • Gregory Michaud,
  • Jutta Blank,
  • Lingling Shen,
  • Raquel Rodríguez-Pérez

Journal volume & issue
Vol. 6
p. 100100

Abstract

Read online

Targeted protein degradation (TPD) is a rapidly developing drug discovery technique with unique efficacy and target scope stemming from its degradation-based activity. Molecular glue degraders are a promising arm of TPD, as evidenced by the FDA-approved therapeutics within this class, the increasing number of degraders in clinical development, and their predisposition to drug-likeness. Cereblon (CRBN) glue degraders mediate target degradation by generating a neomorphic interface between CRBN and a protein of interest. While promising, the complicated nature of this CRBN-glue-target ternary complex makes the rational design of molecular glue degraders challenging. For other drug modalities, predictive modeling has been established to leverage existing activity data and generate quantitative structure-activity relationships (QSAR). However, the applicability of QSAR strategies for glues remains under-investigated. Herein, machine learning methodologies were developed to predict glue-mediated recruitment of CRBN to target proteins and achieved promising performance. Generated models leveraged more than a hundred internal screening campaigns across thousands of CRBN glues to predict glue-mediated recruitment of targets to CRBN. Our results show that recruitment activity of CRBN glue degraders can be modeled by machine learning, with 89 % of models producing an area under the receiver operating characteristic curve (ROC AUC) > 0.8 and 70 % of models producing a Matthew's correlation coefficient (MCC) > 0.2 for these primary screening data. Importantly, our findings also indicate that the combination of compound and protein descriptors in the so-called proteochemometric models improves performance, with >80 % of the models exhibiting higher ROC AUC and MCC values than per-target models only based on compound information. Hence, our investigations suggest that proteochemometric modeling is a successful approach for molecular glue degraders. The proposed machine learning strategies can aid compound prioritization based on recruitment efficacy and target selectivity, thus have the potential to facilitate the design and discovery of therapeutic CRBN molecular glues.

Keywords