Frontiers in Bioinformatics (Jul 2022)
Prediction of Adverse Drug Reaction Linked to Protein Targets Using Network-Based Information and Machine Learning
Abstract
Drug discovery attrition rates, particularly at advanced clinical trial stages, are high because of unexpected adverse drug reactions (ADR) elicited by novel drug candidates. Predicting undesirable ADRs produced by the modulation of certain protein targets would contribute to developing safer drugs, thereby reducing economic losses associated with high attrition rates. As opposed to the more traditional drug-centric approach, we propose a target-centric approach to predict associations between protein targets and ADRs. The implementation of the predictor is based on a machine learning classifier that integrates a set of eight independent network-based features. These include a network diffusion-based score, identification of protein modules based on network clustering algorithms, functional similarity among proteins, network distance to proteins that are part of safety panels used in preclinical drug development, set of network descriptors in the form of degree and betweenness centrality measurements, and conservation. This diverse set of descriptors were used to generate predictors based on different machine learning classifiers ranging from specific models for individual ADR to higher levels of abstraction as per MEDDRA hierarchy such as system organ class. The results obtained from the different machine-learning classifiers, namely, support vector machine, random forest, and neural network were further analyzed as a meta-predictor exploiting three different voting systems, namely, jury vote, consensus vote, and red flag, obtaining different models for each of the ADRs in analysis. The level of accuracy of the predictors justifies the identification of problematic protein targets both at the level of individual ADR as well as a set of related ADRs grouped in common system organ classes. As an example, the prediction of ventricular tachycardia achieved an accuracy and precision of 0.83 and 0.90, respectively, and a Matthew correlation coefficient of 0.70. We believe that this approach is a good complement to the existing methodologies devised to foresee potential liabilities in preclinical drug discovery. The method is available through the DocTOR utility at GitHub (https://github.com/cristian931/DocTOR).
Keywords