BMC Bioinformatics (Oct 2008)
Classification of drug molecules considering their IC<sub>50 </sub>values using mixed-integer linear programming based hyper-boxes method
Abstract
Abstract Background A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC50 values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules. Results We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC50 values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naïve Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy. Conclusion Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.