IEEE Access (Jan 2021)
WebQuIn-LD: A Method of Integrating Web Query Interfaces Based on Linked Data
Abstract
The deep web is a huge source of domain-specific information (sale of houses, medical information, e-commerce, science, etc) stored in database servers accessible through HTML forms called web query interfaces (WQIs). Information in the deep web is retrieved by querying one database server at a time, which results inefficient. A more attractive approach is to create an integrated WQI (IWQI) that acts as single entry point to query several database servers at a time for a given domain. Schema matching and string (labels in WQIs) comparison have been the most popular techniques to create IWQIs. In this work, we propose a new method for the integration of web forms based on linked data and the VDIS (View-based Data Integration System) architecture. We present WebQuIn-LD, an alternative and novel approach relying on linked data principles to combine individual WQIs into a single IWQI for a given domain is presented. WebQuIn-LD follows a data integration system architecture, starting from the wrapping of domain-specific WQIs until the creation of the IWQI. A domain-independent ontology is created to describe WQI elements as linked data resources and to exploit semantic integration between the WQI’s elements. WebQuIn-LD was evaluated on performance metrics (precision, recall, and F1) using the state-of-the-art WQIs datasets for different domains (airfares, books, autos, jobs, music, movies, hotels, jobs). The obtained results demonstrate the effectiveness of the linked data approach presented in this work for the WQI integration problem.
Keywords