Inverse similarity and reliable negative samples for drug side-effect prediction

Yi Zheng; Hui Peng; Shameek Ghosh; Chaowang Lan; Jinyan Li

doi:10.1186/s12859-018-2563-x

BMC Bioinformatics (Feb 2019)

Inverse similarity and reliable negative samples for drug side-effect prediction

Yi Zheng,
Hui Peng,
Shameek Ghosh,
Chaowang Lan,
Jinyan Li

Affiliations

Yi Zheng: Advanced Analytics Institute, FEIT, University of Technology Sydney
Hui Peng: Advanced Analytics Institute, FEIT, University of Technology Sydney
Shameek Ghosh: Advanced Analytics Institute, FEIT, University of Technology Sydney
Chaowang Lan: Advanced Analytics Institute, FEIT, University of Technology Sydney
Jinyan Li: Advanced Analytics Institute, FEIT, University of Technology Sydney

DOI: https://doi.org/10.1186/s12859-018-2563-x
Journal volume & issue: Vol. 19, no. S13
pp. 91 – 104

Abstract

Read online

Abstract Background In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations for the prediction. The performance is severely impeded by the lack of reliable negative training data. Thus, a method to select reliable negative samples becomes vital in the performance improvement. Methods Most of the existing computational prediction methods are essentially based on the assumption that similar drugs are inclined to share the same side-effects, which has given rise to remarkable performance. It is also rational to assume an inverse proposition that dissimilar drugs are less likely to share the same side-effects. Based on this inverse similarity hypothesis, we proposed a novel method to select highly-reliable negative samples for side-effect prediction. The first step of our method is to build a drug similarity integration framework to measure the similarity between drugs from different perspectives. This step integrates drug chemical structures, drug target proteins, drug substituents, and drug therapeutic information as features into a unified framework. Then, a similarity score between each candidate negative drug and validated positive drugs is calculated using the similarity integration framework. Those candidate negative drugs with lower similarity scores are preferentially selected as negative samples. Finally, both the validated positive drugs and the selected highly-reliable negative samples are used for predictions. Results The performance of the proposed method was evaluated on simulative side-effect prediction of 917 DrugBank drugs, comparing with four machine-learning algorithms. Extensive experiments show that the drug similarity integration framework has superior capability in capturing drug features, achieving much better performance than those based on a single type of drug property. Besides, the four machine-learning algorithms achieved significant improvement in macro-averaging F1-score (e.g., SVM from 0.655 to 0.898), macro-averaging precision (e.g., RBF from 0.592 to 0.828) and macro-averaging recall (e.g., KNN from 0.651 to 0.772) complimentarily attributed to the highly-reliable negative samples selected by the proposed method. Conclusions The results suggest that the inverse similarity hypothesis and the integration of different drug properties are valuable for side-effect prediction. The selection of highly-reliable negative samples can also make significant contributions to the performance improvement.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords