IEEE Access (Jan 2021)

On Approximation of Concept Similarity Measure in Description Logic <italic>ELH</italic> With Pre-Trained Word Embedding

  • Teeradaj Racharak

DOI
https://doi.org/10.1109/ACCESS.2021.3073730
Journal volume & issue
Vol. 9
pp. 61429 – 61443

Abstract

Read online

Data-driven and knowledge-driven methods are two mainstream techniques in the pursuit of developing artificial intelligence systems. While data-driven methods seek to develop a decision model from observations in the real world, they are difficult to provide an explanation for the results in human terms. On the other hand, knowledge-driven methods that employ symbolic reasoning based on formal semantics of a knowledge-base are thus more interpretable and explainable, while lacking an ability to deal with incomplete modeling of the structured knowledge-bases. This work aims to tackle these issues on ontology similarity by proposing a general framework that combines the strengths of both approaches for measuring semantic similarity of concepts in a description logic (DL) ontology. More specifically, a neuro-symbolic integrated framework is defined to exploit the pre-trained word embeddings with semantic definitions in an ontology to yield an explainable degree of concept similarity. To demonstrate its applicability, we develop a concrete similarity measure $ {\textsf {sim}}_\epsilon $ conforming to the proposed framework and also introduce an efficient algorithm that can extract an explanation for why such a degree is indicated. The correctness is shown by analyzing theoretical properties that it guarantees to preserve and also by performing an empirical evaluation with a medical ontology SNOMED CT and a medical pre-trained embedding BioWordVec. The results show that our proposed method remains both interpretability and explainability while achieving comparable performance, relative to the state-of-the-art approaches in the data and knowledge-driven methods.

Keywords