Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses

Andrés López-Cortés; Alejandro Cabrera-Andrade; Gabriela Echeverría-Garcés; Paulina Echeverría-Espinoza; Micaela Pineda-Albán; Nicole Elsitdie; José Bueno-Miño; Carlos M. Cruz-Segundo; Julian Dorado; Alejandro Pazos; Humberto Gonzáles-Díaz; Yunierkis Pérez-Castillo; Eduardo Tejera; Cristian R. Munteanu

doi:10.1038/s41598-024-68565-7

Scientific Reports (Aug 2024)

Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses

Andrés López-Cortés,
Alejandro Cabrera-Andrade,
Gabriela Echeverría-Garcés,
Paulina Echeverría-Espinoza,
Micaela Pineda-Albán,
Nicole Elsitdie,
José Bueno-Miño,
Carlos M. Cruz-Segundo,
Julian Dorado,
Alejandro Pazos,
Humberto Gonzáles-Díaz,
Yunierkis Pérez-Castillo,
Eduardo Tejera,
Cristian R. Munteanu

Affiliations

Andrés López-Cortés: Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas
Alejandro Cabrera-Andrade: Grupo de Bio-Quimioinformática, Universidad de Las Américas
Gabriela Echeverría-Garcés: Centro de Referencia Nacional de Genómica, Secuenciación y Bioinformática, Instituto Nacional de Investigación en Salud Pública “Leopoldo Izquieta Pérez”
Paulina Echeverría-Espinoza: Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas
Micaela Pineda-Albán: Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas
Nicole Elsitdie: Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas
José Bueno-Miño: Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas
Carlos M. Cruz-Segundo: RNASA-IMEDIR, Computer Science Faculty, University of A Coruna
Julian Dorado: RNASA-IMEDIR, Computer Science Faculty, University of A Coruna
Alejandro Pazos: RNASA-IMEDIR, Computer Science Faculty, University of A Coruna
Humberto Gonzáles-Díaz: Department of Organic Chemistry II, University of the Basque Country UPV/EHU
Yunierkis Pérez-Castillo: Grupo de Bio-Quimioinformática, Universidad de Las Américas
Eduardo Tejera: Grupo de Bio-Quimioinformática, Universidad de Las Américas
Cristian R. Munteanu: RNASA-IMEDIR, Computer Science Faculty, University of A Coruna

DOI: https://doi.org/10.1038/s41598-024-68565-7
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 22

Abstract

Read online

Abstract The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal