Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web

Maxim Lapaev

doi:10.1109/FRUCT-ISPIT.2016.7561521

Proceedings of the XXth Conference of Open Innovations Association FRUCT (Apr 2016)

Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web

Maxim Lapaev

Affiliations

Maxim Lapaev: ITMO University, St. Petersburg, Russia

DOI: https://doi.org/10.1109/FRUCT-ISPIT.2016.7561521
Journal volume & issue: Vol. 664, no. 18
pp. 153 – 160

Abstract

Read online

Ontology design and the process of populating a data-set with knowledge following the chosen or developed ontology to fit the principles of Semantic Web and Linked Open Data is a time-consuming and iterative process, requiring either expert knowledge or a set of tools for data scraping from web. A valid and consistent ontology and knowledge withing the data-set require unification of concepts which means overcoming ambiguity and synonymy of terms which become individuals of ontology. In this paper we spot on techniques used for organising a Russian food product data-set under a light-weight FOOD Ontology and concept matching in particular. Main approaches to data-set concept unification, synonymic term matching and ways to collect dictionaries for matcher are mentioned. The tool for catalogue-like semi-structured resources parsing and thesaurus extraction is developed and introduced for the task of on-the-fly concept matching.

Published in Proceedings of the XXth Conference of Open Innovations Association FRUCT

ISSN: 2305-7254 (Print); 2343-0737 (Online)
Publisher: FRUCT
Country of publisher: Finland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: http://fruct.org/publication

About the journal

Abstract

Keywords