Complex & Intelligent Systems (Aug 2022)

Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish

  • Ana B. Rios-Alvarado,
  • Jose L. Martinez-Rodriguez,
  • Andrea G. Garcia-Perez,
  • Tania Y. Guerrero-Melendez,
  • Ivan Lopez-Arevalo,
  • Jose Luis Gonzalez-Compean

DOI
https://doi.org/10.1007/s40747-022-00805-7
Journal volume & issue
Vol. 9, no. 2
pp. 1281 – 1297

Abstract

Read online

Abstract Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descriptions. Generating KGs from texts written in Spanish represents a research challenge as the existing structures, models, and strategies designed for other languages are not compatible in this scenario. This paper proposes a method to design and construct KGs from unstructured text in Spanish. We defined lexical patterns to extract named entities and (non) taxonomic, equivalence, and composition relations. Next, named entities are linked and enriched with DBpedia resources through a strategy based on SPARQL queries. Finally, OWL properties are defined from the predicate relations for creating resource description framework (RDF) triples. We evaluated the performance of the proposed method to determine the degree of elements extracted from the input text and to assess their quality through standard information retrieval measures. The evaluation revealed the feasibility of the proposed method to extract RDF triples from datasets in general and computer science domains. Competitive results were observed by comparing our method regarding an existing approach from the literature.

Keywords