Machine Learning Modeling of Protein-intrinsic Features Predicts Tractability of Targeted Protein Degradation

Wubing Zhang; Shourya S. Roy Burman; Jiaye Chen; Katherine A. Donovan; Yang Cao; Chelsea Shu; Boning Zhang; Zexian Zeng; Shengqing Gu; Yi Zhang; Dian Li; Eric S. Fischer; Collin Tokheim; X. Shirley Liu

Genomics, Proteomics & Bioinformatics (Oct 2022)

Machine Learning Modeling of Protein-intrinsic Features Predicts Tractability of Targeted Protein Degradation

Wubing Zhang,
Shourya S. Roy Burman,
Jiaye Chen,
Katherine A. Donovan,
Yang Cao,
Chelsea Shu,
Boning Zhang,
Zexian Zeng,
Shengqing Gu,
Yi Zhang,
Dian Li,
Eric S. Fischer,
Collin Tokheim,
X. Shirley Liu

Affiliations

Wubing Zhang: Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
Shourya S. Roy Burman: Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
Jiaye Chen: Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
Katherine A. Donovan: Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
Yang Cao: Center of Growth, Metabolism, and Aging, Key Laboratory of Bio-resource and Eco-environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610064, China
Chelsea Shu: Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Research Scholar Initiative, Graduate School of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
Boning Zhang: Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
Zexian Zeng: Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
Shengqing Gu: Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
Yi Zhang: Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
Dian Li: Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
Eric S. Fischer: Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Corresponding authors.
Collin Tokheim: Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Corresponding authors.
X. Shirley Liu: Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Corresponding authors.

Journal volume & issue: Vol. 20, no. 5
pp. 882 – 898

Abstract

Read online

Targeted protein degradation (TPD) has rapidly emerged as a therapeutic modality to eliminate previously undruggable proteins by repurposing the cell’s endogenous protein degradation machinery. However, the susceptibility of proteins for targeting by TPD approaches, termed “degradability”, is largely unknown. Here, we developed a machine learning model, model-free analysis of protein degradability (MAPD), to predict degradability from features intrinsic to protein targets. MAPD shows accurate performance in predicting kinases that are degradable by TPD compounds [with an area under the precision–recall curve (AUPRC) of 0.759 and an area under the receiver operating characteristic curve (AUROC) of 0.775] and is likely generalizable to independent non-kinase proteins. We found five features with statistical significance to achieve optimal prediction, with ubiquitination potential being the most predictive. By structural modeling, we found that E2-accessible ubiquitination sites, but not lysine residues in general, are particularly associated with kinase degradability. Finally, we extended MAPD predictions to the entire proteome to find 964 disease-causing proteins (including proteins encoded by 278 cancer genes) that may be tractable to TPD drug development.

Published in Genomics, Proteomics & Bioinformatics

ISSN: 1672-0229 (Print); 2210-3244 (Online)
Publisher: Oxford University Press
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General); Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://academic.oup.com/gpb

About the journal

Abstract

Keywords