Exploiting Latent Semantic Subspaces to Derive Associations for Specific Pharmaceutical Semantics

Janus Wawrzinek; José María González Pinto; Oliver Wiehr; Wolf-Tilo Balke

doi:10.1007/s41019-020-00140-2

Data Science and Engineering (Aug 2020)

Exploiting Latent Semantic Subspaces to Derive Associations for Specific Pharmaceutical Semantics

Janus Wawrzinek,
José María González Pinto,
Oliver Wiehr,
Wolf-Tilo Balke

Affiliations

Janus Wawrzinek: Institute for Information Systems, TU-Braunschweig
José María González Pinto: Institute for Information Systems, TU-Braunschweig
Oliver Wiehr: Institute for Information Systems, TU-Braunschweig
Wolf-Tilo Balke: Institute for Information Systems, TU-Braunschweig

DOI: https://doi.org/10.1007/s41019-020-00140-2
Journal volume & issue: Vol. 5, no. 4
pp. 333 – 345

Abstract

Read online

Abstract State-of-the-art approaches in the field of neural embedding models (NEMs) enable progress in the automatic extraction and prediction of semantic relations between important entities like active substances, diseases, and genes. In particular, the prediction property is making them valuable for important research-related tasks such as hypothesis generation and drug repositioning. A core challenge in the biomedical domain is to have interpretable semantics from NEMs that can distinguish, for instance, between the following two situations: (a) drug x induces disease y and (b) drug x treats disease y. However, NEMs alone cannot distinguish between associations such as treats or induces. Is it possible to develop a model to learn a latent representation from the NEMs capable of such disambiguation? To what extent do we need domain knowledge to succeed in the task? In this paper, we answer both questions and show that our proposed approach not only succeeds in the disambiguation task but also advances current growing research efforts to find real predictions using a sophisticated retrospective analysis. Furthermore, we investigate which type of associations is generally better contextualized and therefore probably has a stronger influence in our disambiguation task. In this context, we present an approach to extract an interpretable latent semantic subspace from the original embedding space in which therapeutic drug–disease associations are more likely .

Published in Data Science and Engineering

ISSN: 2364-1185 (Print); 2364-1541 (Online)
Publisher: SpringerOpen
Country of publisher: Germany
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.springer.com/41019

About the journal

Abstract

Keywords