Automating Computer Science Ontology Extension With Classification Techniques

Natasha C. Santosa; Jun Miyazaki; Hyoil Han

doi:10.1109/ACCESS.2021.3131627

IEEE Access (Jan 2021)

Automating Computer Science Ontology Extension With Classification Techniques

Natasha C. Santosa,
Jun Miyazaki,
Hyoil Han

Affiliations

Natasha C. Santosa: ORCiD; Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo, Japan
Jun Miyazaki: Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo, Japan
Hyoil Han: ORCiD; School of Information Technology, Illinois State University, Normal, IL, USA

DOI: https://doi.org/10.1109/ACCESS.2021.3131627
Journal volume & issue: Vol. 9
pp. 161815 – 161833

Abstract

Read online

In information technology, an ontology is a knowledge structure consisting of terminologies (topics), their definitions, and relational information within one or multiple domains. This semantically represented information can be used for downstream tasks, such as document classification and recommendation systems. However, as big data prevails, manually extending existing ontologies with up-to-date information becomes challenging due to the tedious and time-consuming process and the expensive cost of expert manual labor. To alleviate this problem, this paper aims to achieve a fully automatic ontology extension. We propose a novel “Direct” approach for extending an existing Computer Science Ontology (CSO). This approach consists of two steps: initially extending the CSO with new topics and using this extended graph to obtain the new topic’s node embeddings as inputs for training classifiers. However, this initial extension still contains numerous noisy links; therefore, the classifier simultaneously acts as a noisy-link filter and a link predictor. We experiment with various traditional machine learning and recent deep learning models and then compare them under our Direct approach framework. We propose two evaluation procedures to decide the best-performing model: a novel Wikipedia-based $F1_{w}$ score and a total number of resulting links. Further meta-evaluation employing four experts confirmed the reliability of our proposed approach and evaluation procedures. We found that the Direct approach’s Gaussian Naive Bayes model produces the most valid and reliable links; therefore, we use it to further extend the CSO with hundreds of new CS topics and links.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords