Open Computer Science (Aug 2019)
Extracting ontological knowledge from Java source code using Hidden Markov Models
Abstract
Ontologies have become a key element since many decades in information systems such as in epidemiological surveillance domain. Building domain ontologies requires the access to domain knowledge owned by domain experts or contained in knowledge sources. However, domain experts are not always available for interviews. Therefore, there is a lot of value in using ontology learning which consists in automatic or semi-automatic extraction of ontological knowledge from structured or unstructured knowledge sources such as texts, databases, etc. Many techniques have been used but they all are limited in concepts, properties and terminology extraction leaving behind axioms and rules. Source code which naturally embed domain knowledge is rarely used. In this paper, we propose an approach based on Hidden Markov Models (HMMs) for concepts, properties, axioms and rules learning from Java source code. This approach is experimented with the source code of EPICAM, an epidemiological platform developed in Java and used in Cameroon for tuberculosis surveillance. Domain experts involved in the evaluation estimated that knowledge extracted was relevant to the domain. In addition, we performed an automatic evaluation of the relevance of the terms extracted to the medical domain by aligning them with ontologies hosted on Bioportal platform through the Ontology Recommender tool. The results were interesting since the terms extracted were covered at 82.9% by many biomedical ontologies such as NCIT, SNOWMEDCT and ONTOPARON.
Keywords