Research on the Method of Extracting Domain Knowledge From the Freebase RDF Dumps

Deyan Chen; Hong Zhao

doi:10.1109/access.2018.2868516

IEEE Access (Jan 2018)

Research on the Method of Extracting Domain Knowledge From the Freebase RDF Dumps

Deyan Chen,
Hong Zhao

Affiliations

Deyan Chen: ORCiD; School of Computer Science & Engineering, Northeastern University, Shenyang, China
Hong Zhao: School of Computer Science & Engineering, Northeastern University, Shenyang, China

DOI: https://doi.org/10.1109/access.2018.2868516
Journal volume & issue: Vol. 6
pp. 50306 – 50322

Abstract

Read online

In the process of constructing a domain semantic knowledge base based on ontologies, reusing existing domain knowledge bases not only facilitates sharing, integration, and reuse of the domain semantic knowledge base but also can accelerate the construction of the domain semantic knowledge base. The open and fast growing Freebase database is a good data source, which can be reused to construct the domain semantic knowledge base. However, extracting domain knowledge from the Freebase Resource Description Framework (RDF) dumps faces many challenges. For example, the dump package is too large to read or load; the dump package contains a lot of unnecessary and redundant facts; some ill-formed triples may cause the load to fail, and so on. In response to these obstacles and the deficiencies of existing research, this paper proposes a method to extract domain knowledge quickly, accurately, and completely from the Freebase RDF dumps and describes the domain knowledge using the semantic constructs in ontology standard description languages. Taking extracting the ontology schema and instance data of the medicine domain, including the facts pointing to semantically related domains, as an example, the principle and implementation process of the method are explained in detail and the algorithms of the key processes are described. Finally, the method of this paper is evaluated, including the comparison and analysis of related methods with work objectives, software tools used, processing results, processing performance, accuracy, completeness, and reusability.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords