Post-processing of Deep Web Information Extraction Based on Domain Ontology

PENG, T.; LIU, L.

doi:10.4316/AECE.2013.04005

Advances in Electrical and Computer Engineering (Nov 2013)

Post-processing of Deep Web Information Extraction Based on Domain Ontology

PENG, T.,
LIU, L.

Affiliations

PENG, T.
LIU, L.

DOI: https://doi.org/10.4316/AECE.2013.04005
Journal volume & issue: Vol. 13, no. 4
pp. 25 – 32

Abstract

Read online

Many methods are utilized to extract and process query results in deep Web, which rely on the different structures of Web pages and various designing modes of databases. However, some semantic meanings and relations are ignored. So, in this paper, we present an approach for post-processing deep Web query results based on domain ontology which can utilize the semantic meanings and relations. A block identification model (BIM) based on node similarity is defined to extract data blocks that are relevant to specific domain after reducing noisy nodes. Feature vector of domain books is obtained by result set extraction model (RSEM) based on vector space model (VSM). RSEM, in combination with BIM, builds the domain ontology on books which can not only remove the limit of Web page structures when extracting data information, but also make use of semantic meanings of domain ontology. After extracting basic information of Web pages, a ranking algorithm is adopted to offer an ordered list of data records to users. Experimental results show that BIM and RSEM extract data blocks and build domain ontology accurately. In addition, relevant data records and basic information are extracted and ranked. The performances precision and recall show that our proposed method is feasible and efficient.

Published in Advances in Electrical and Computer Engineering

ISSN: 1582-7445 (Print); 1844-7600 (Online)
Publisher: Stefan cel Mare University of Suceava
Country of publisher: Romania
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: http://www.aece.ro

About the journal

Abstract

Keywords