IEEE Access (Jan 2021)
Automating Computer Science Ontology Extension With Classification Techniques
Abstract
In information technology, an ontology is a knowledge structure consisting of terminologies (topics), their definitions, and relational information within one or multiple domains. This semantically represented information can be used for downstream tasks, such as document classification and recommendation systems. However, as big data prevails, manually extending existing ontologies with up-to-date information becomes challenging due to the tedious and time-consuming process and the expensive cost of expert manual labor. To alleviate this problem, this paper aims to achieve a fully automatic ontology extension. We propose a novel “Direct” approach for extending an existing Computer Science Ontology (CSO). This approach consists of two steps: initially extending the CSO with new topics and using this extended graph to obtain the new topic’s node embeddings as inputs for training classifiers. However, this initial extension still contains numerous noisy links; therefore, the classifier simultaneously acts as a noisy-link filter and a link predictor. We experiment with various traditional machine learning and recent deep learning models and then compare them under our Direct approach framework. We propose two evaluation procedures to decide the best-performing model: a novel Wikipedia-based $F1_{w}$ score and a total number of resulting links. Further meta-evaluation employing four experts confirmed the reliability of our proposed approach and evaluation procedures. We found that the Direct approach’s Gaussian Naive Bayes model produces the most valid and reliable links; therefore, we use it to further extend the CSO with hundreds of new CS topics and links.
Keywords