Frontiers in Marine Science (May 2014)

A QSAR approach for virtual screening of lead-like molecules en route to antitumor and antibiotic drugs from marine and microbial natural products

  • Florbela Pereira,
  • Diogo A. R. S. Latino

DOI
https://doi.org/10.3389/conf.fmars.2014.02.00062
Journal volume & issue
Vol. 1

Abstract

Read online

Natural products (NPs), or synthetic products inspired by NPs, have been the single most productive source of leads for drug development. In fact, more than half of the approved drugs from 1981 to 2010 were based on NPs.1 At the turn of the 21st century, a new branch of NPs chemistry was fully established – Marine Natural Products (MNPs). The future seems very promising for this new NP subfield, since MNPs chemists have already elucidated the chemical structure of over 22,000 novel compounds.2 Moreover, from these, 7 are already approved drugs, (four anticancer, one antiviral, one pain control, and one hypertriglyceridemia).3 The success rate of drug discovery from the marine world is 1 drug per 3,140 natural products described. This rate is approximately 1.7- to 3.3-fold better than the industry average (1 in 5,000–10,000 tested compounds).4 Nowadays, there are facilities for high-throughput screening available both in academic labs or in pharmaceutical companies, but the cost of random screening for collections with a high number of compounds can nevertheless be prohibitive, making chemoinformatics approaches for virtual screening of the most probable active compounds valuable tools. The present study focuses on the application of machine learning (ML) techniques to explore lead-like molecules en route to antitumor and antibiotic drugs from 418 MNPs and microbial natural products (MbNPs) extracted from the AntiMarin database. The AntiMarin database contains approximately 50,000 compounds from marine macroorganisms and both marine and terrestrial microorganisms.5 Our models were developed using 1746 active and non-active compounds from the PubChem database. State-of-the-art ML algorithms, such as Support Vector Machines (SVMs), Random Forests (Rfs) and Classification Tree (CTs), were compared to predict the two classes (i.e., active and non-active compounds) in the following classification task: (1) the overall biological activity; (2) antitumor activity; and (3) antibiotic activities. For each task two models were built, one using 8 semi-empirical quantum-chemical descriptors calculated by the PM6 method (energy of the highest occupied molecular orbital, εHOMO; energy of the lowest unoccupied molecular orbital, εLUMO; hardness, η = (εLUMO - εHOMO); chemical potential, μ = -(εHOMO + εLUMO)/2; Mulliken electronegativity, χ = - μ; Parr & Pople absolute hardness, (εHOMO - εLUMO)/2; Schuurmann MO shift alpha, (εHOMO + εLUMO)/2; electrophilicity index, ω = μ2/(2 η)) and the other using simultaneously CDK descriptors and PM6 descriptors. The results obtained with these two approaches were compared with our recently published work using CDK descriptors,6 Table 1. The best classification models for antibiotic and antitumor activities were used to screen a data set of marine and microbial natural products from the AntiMarin database. The screen originates 25 and 4 possible lead-like compounds for antibiotic and antitumor drug design, Figure 1 and 2, respectively. From those 25 lead-like antibiotic MNP and MbNP, seven (IDs 484, 735, 742, 861, 959, 739, and 741) have been already predicted as active in the overall biological model. All compounds suggested by our approach are classified as non-antibiotic and non-antitumor compounds in the AntiMarin database. Recently several of the lead-like compounds proposed by us were reported as being active in the literature. Figure 1. The unreported 15 lead antibiotic MNPs and MbNPs from AntiMarin database, using the best Rfs antibiotic model with a probability of being antibiotic greater than or equal to 0.8. Figure 2. The selected 4 lead antitumor MNPs and MbNPs from the AntiMarin database, using the best Rfs antitumor model with a probability of being antitumor greater than or equal to 0.8. The present work corroborates by one side the results of our previous work6 and enables the presentation of a new set of possible lead like bioactive compounds. Additionally, it is shown the usefulness of quantum-chemical descriptors in the discrimination of biological active and inactive compounds. The use of the εHOMO quantum-chemical descriptor in the discrimination of large scale data sets of lead-like or drug-like compounds has never been reported. This approach results in the reduction, in great extent, of the number of compounds used in real screens, and it reinforces the results of our previous work. Furthermore, besides the virtual screening, the computational methods can be very useful to build appropriate databases, allowing for effective shortcuts of NP extracts dereplication procedures, which will certainly result in increasing the efficiency of drug discovery.

Keywords