Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer

Sejal Sharma; Liping Feng; Nicha Boonpattrawong; Arvinder Kapur; Lisa Barroilhet; Manish S. Patankar; Spencer S. Ericksen

doi:10.1186/s13321-024-00906-0

Journal of Cheminformatics (Oct 2024)

Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer

Sejal Sharma,
Liping Feng,
Nicha Boonpattrawong,
Arvinder Kapur,
Lisa Barroilhet,
Manish S. Patankar,
Spencer S. Ericksen

Affiliations

Sejal Sharma: University of Wisconsin-Madison, Department of Obstetrics and Gynecology
Liping Feng: University of Wisconsin-Madison, Department of Obstetrics and Gynecology
Nicha Boonpattrawong: University of Wisconsin-Madison, Department of Obstetrics and Gynecology
Arvinder Kapur: University of Wisconsin-Madison, Department of Obstetrics and Gynecology
Lisa Barroilhet: University of Wisconsin-Madison, Department of Obstetrics and Gynecology
Manish S. Patankar: University of Wisconsin-Madison, Department of Obstetrics and Gynecology
Spencer S. Ericksen: University of Wisconsin-Madison, UW-Carbone Cancer Center, Drug Development Core, Small Molecule Screening Facility, Wisconsin Institutes for Medical Research

DOI: https://doi.org/10.1186/s13321-024-00906-0
Journal volume & issue: Vol. 16, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Focused screening on target-prioritized compound sets can be an efficient alternative to high throughput screening (HTS). For most biomolecular targets, compound prioritization models depend on prior screening data or a target structure. For phenotypic or multi-protein pathway targets, it may not be clear which public assay records provide relevant data. The question also arises as to whether data collected from disparate assays might be usefully consolidated. Here, we report on the development and application of a data mining pipeline to examine these issues. To illustrate, we focus on identifying inhibitors of oxidative phosphorylation, a druggable metabolic process in epithelial ovarian tumors. The pipeline compiled 8415 available OXPHOS-related bioassays in the PubChem data repository involving 312,093 unique compound records. Application of PubChem assay activity annotations, PAINS (Pan Assay Interference Compounds), and Lipinski-like bioavailability filters yields 1852 putative OXPHOS-active compounds that fall into 464 clusters. These chemotypes are diverse but have relatively high hydrophobicity and molecular weight but lower complexity and drug-likeness. These chemotypes show a high abundance of bicyclic ring systems and oxygen containing functional groups including ketones, allylic oxides (alpha/beta unsaturated carbonyls), hydroxyl groups, and ethers. In contrast, amide and primary amine functional groups have a notably lower than random prevalence. UMAP representation of the chemical space shows strong divergence in the regions occupied by OXPHOS-inactive and -active compounds. Of the six compounds selected for biological testing, 4 showed statistically significant inhibition of electron transport in bioenergetics assays. Two of these four compounds, lacidipine and esbiothrin, increased in intracellular oxygen radicals (a major hallmark of most OXPHOS inhibitors) and decreased the viability of two ovarian cancer cell lines, ID8 and OVCAR5. Finally, data from the pipeline were used to train random forest and support vector classifiers that effectively prioritized OXPHOS inhibitory compounds within a held-out test set (ROCAUC 0.962 and 0.927, respectively) and on another set containing 44 documented OXPHOS inhibitors outside of the training set (ROCAUC 0.900 and 0.823). This prototype pipeline is extensible and could be adapted for focus screening on other phenotypic targets for which sufficient public data are available. Scientific contribution Here, we describe and apply an assay data mining pipeline to compile, process, filter, and mine public bioassay data. We believe the procedure may be more broadly applied to guide compound selection in early-stage hit finding on novel multi-protein mechanistic or phenotypic targets. To demonstrate the utility of our approach, we apply a data mining strategy on a large set of public assay data to find drug-like molecules that inhibit oxidative phosphorylation (OXPHOS) as candidates for ovarian cancer therapies. Graphical Abstract

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords