Current Research in Biotechnology (Jan 2024)

AutoTarget: Disease-Associated druggable target identification via node representation learning in PPI networks

  • Hyunseung Kong,
  • Inyoung Kim,
  • Byoung-Tak Zhang

Journal volume & issue
Vol. 8
p. 100260

Abstract

Read online

Drug target discovery, a pivotal early stage in drug development, is resource-intensive and crucial for ensuring drug efficacy. This study presents AutoTarget, a novel computational pipeline designed to identify disease-associated druggable targets by applying node representation learning to protein–protein interaction (PPI) networks. AutoTarget uses node2vec + for node classification, incorporating neighborhood context and structural equivalence in PPI networks derived from the STRING database. Data from the Therapeutic Target Database (TTD) and DisGeNET were integrated to identify known drug targets and gene-disease associations, respectively. Each protein is embedded into a 128-dimensional vector space, capturing local network structures and enabling the identification of structurally equivalent proteins. A Naïve Bayes classifier, trained on these embeddings, achieved a recall of 0.90 and an F1 score of 0.79 in predicting potential drug targets. AutoTarget identified 3,979 novel potential druggable target proteins out of 19,333 proteins in the PPI network, which were mapped to 23,363 diseases using DisGeNET. This creates a comprehensive resource for disease-specific drug target exploration. Case studies on triple-negative breast cancer and obesity demonstrated AutoTarget’s capability to identify both established and emerging targets, such as CD44, MAPK3, and GIP. Visualization of embedding vectors using t-SNE revealed clear separations between functional protein families, including nuclear proteins, growth factor receptors, and the G proteins within the kinase proteins. This supports the method’s ability to capture biologically relevant information. However, limitations were noted, including the inability to distinguish between different types of disease-associated proteins based solely on network features. Overall, this study advances the application of machine learning and network theory for identifying druggable targets across a wide range of diseases. AutoTarget provides researchers with a valuable tool for expediting the discovery of novel druggable targets, potentially streamlining the drug discovery process. The AutoTarget code and database are publicly available to facilitate further research.