IEEE Access (Jan 2018)
Research on the Method of Extracting Domain Knowledge From the Freebase RDF Dumps
Abstract
In the process of constructing a domain semantic knowledge base based on ontologies, reusing existing domain knowledge bases not only facilitates sharing, integration, and reuse of the domain semantic knowledge base but also can accelerate the construction of the domain semantic knowledge base. The open and fast growing Freebase database is a good data source, which can be reused to construct the domain semantic knowledge base. However, extracting domain knowledge from the Freebase Resource Description Framework (RDF) dumps faces many challenges. For example, the dump package is too large to read or load; the dump package contains a lot of unnecessary and redundant facts; some ill-formed triples may cause the load to fail, and so on. In response to these obstacles and the deficiencies of existing research, this paper proposes a method to extract domain knowledge quickly, accurately, and completely from the Freebase RDF dumps and describes the domain knowledge using the semantic constructs in ontology standard description languages. Taking extracting the ontology schema and instance data of the medicine domain, including the facts pointing to semantically related domains, as an example, the principle and implementation process of the method are explained in detail and the algorithms of the key processes are described. Finally, the method of this paper is evaluated, including the comparison and analysis of related methods with work objectives, software tools used, processing results, processing performance, accuracy, completeness, and reusability.
Keywords