BMC Bioinformatics (Apr 2018)
A two-tiered unsupervised clustering approach for drug repositioning through heterogeneous data integration
Abstract
Abstract Background Drug repositioning is the process of identifying new uses for existing drugs. Computational drug repositioning methods can reduce the time, costs and risks of drug development by automating the analysis of the relationships in pharmacology networks. Pharmacology networks are large and heterogeneous. Clustering drugs into small groups can simplify large pharmacology networks, these subgroups can also be used as a starting point for repositioning drugs. In this paper, we propose a two-tiered drug-centric unsupervised clustering approach for drug repositioning, integrating heterogeneous drug data profiles: drug-chemical, drug-disease, drug-gene, drug-protein and drug-side effect relationships. Results The proposed drug repositioning approach is threefold; (i) clustering drugs based on their homogeneous profiles using the Growing Self Organizing Map (GSOM); (ii) clustering drugs based on drug-drug relation matrices based on the previous step, considering three state-of-the-art graph clustering methods; and (iii) inferring drug repositioning candidates and assigning a confidence value for each identified candidate. In this paper, we compare our two-tiered clustering approach against two existing heterogeneous data integration approaches with reference to the Anatomical Therapeutic Chemical (ATC) classification, using GSOM. Our approach yields Normalized Mutual Information (NMI) and Standardized Mutual Information (SMI) of 0.66 and 36.11, respectively, while the two existing methods yield NMI of 0.60 and 0.64 and SMI of 22.26 and 33.59. Moreover, the two existing approaches failed to produce useful cluster separations when using graph clustering algorithms while our approach is able to identify useful clusters for drug repositioning. Furthermore, we provide clinical evidence for four predicted results (Chlorthalidone, Indomethacin, Metformin and Thioridazine) to support that our proposed approach can be reliably used to infer ATC code and drug repositioning. Conclusion The proposed two-tiered unsupervised clustering approach is suitable for drug clustering and enables heterogeneous data integration. It also enables identifying reliable repositioning drug candidates with reference to ATC therapeutic classification. The repositioning drug candidates identified consistently by multiple clustering algorithms and with high confidence have a higher possibility of being effective repositioning candidates.
Keywords