Extracting ontological knowledge from Java source code using Hidden Markov Models

Jiomekong Azanzi; Camara Gaoussou; Tchuente Maurice

doi:10.1515/comp-2019-0013

Open Computer Science (Aug 2019)

Extracting ontological knowledge from Java source code using Hidden Markov Models

Jiomekong Azanzi,
Camara Gaoussou,
Tchuente Maurice

Affiliations

Jiomekong Azanzi: University of Yaounde I, Faculty of Science, Yaounde, Cameroon; IRD, Sorbonne Université, UMMISCO, F-93143, Bondy, France;
Camara Gaoussou: LIMA, Université Alioune Diop de Bambey, Sénégal; IRD, Sorbonne Université, UMMISCO, F-93143, Bondy, France;
Tchuente Maurice: University of Yaounde I, Faculty of Science, Yaounde, Cameroon; IRD, Sorbonne Université, UMMISCO, F-93143, Bondy, France;

DOI: https://doi.org/10.1515/comp-2019-0013
Journal volume & issue: Vol. 9, no. 1
pp. 181 – 199

Abstract

Read online

Ontologies have become a key element since many decades in information systems such as in epidemiological surveillance domain. Building domain ontologies requires the access to domain knowledge owned by domain experts or contained in knowledge sources. However, domain experts are not always available for interviews. Therefore, there is a lot of value in using ontology learning which consists in automatic or semi-automatic extraction of ontological knowledge from structured or unstructured knowledge sources such as texts, databases, etc. Many techniques have been used but they all are limited in concepts, properties and terminology extraction leaving behind axioms and rules. Source code which naturally embed domain knowledge is rarely used. In this paper, we propose an approach based on Hidden Markov Models (HMMs) for concepts, properties, axioms and rules learning from Java source code. This approach is experimented with the source code of EPICAM, an epidemiological platform developed in Java and used in Cameroon for tuberculosis surveillance. Domain experts involved in the evaluation estimated that knowledge extracted was relevant to the domain. In addition, we performed an automatic evaluation of the relevance of the terms extracted to the medical domain by aligning them with ontologies hosted on Bioportal platform through the Ontology Recommender tool. The results were interesting since the terms extracted were covered at 82.9% by many biomedical ontologies such as NCIT, SNOWMEDCT and ONTOPARON.

Published in Open Computer Science

ISSN: 2299-1093 (Online)
Publisher: De Gruyter
Country of publisher: Poland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.degruyter.com/view/j/comp

About the journal

Abstract

Keywords