SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink

Oscar Ceballos; Carlos Alberto Ramírez Restrepo; María Constanza Pabón; Andres M. Castillo; Oscar Corcho

doi:10.3390/app11157033

Applied Sciences (Jul 2021)

SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink

Oscar Ceballos,
Carlos Alberto Ramírez Restrepo,
María Constanza Pabón,
Andres M. Castillo,
Oscar Corcho

Affiliations

Oscar Ceballos: Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Ciudad Universitaria Meléndez Calle 13 No. 100-00, Cali 760032, Colombia
Carlos Alberto Ramírez Restrepo: Departamento de Electrónica y Ciencias de la Computación, Pontificia Universidad Javeriana Cali, Calle 18 No. 118-250, Cali 760031, Colombia
María Constanza Pabón: Departamento de Electrónica y Ciencias de la Computación, Pontificia Universidad Javeriana Cali, Calle 18 No. 118-250, Cali 760031, Colombia
Andres M. Castillo: Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Ciudad Universitaria Meléndez Calle 13 No. 100-00, Cali 760032, Colombia
Oscar Corcho: Ontology Engineering Group, Universidad Politécnica de Madrid, Campus de Montegancedo, Boadilla del Monte, 28660 Madrid, Spain

DOI: https://doi.org/10.3390/app11157033
Journal volume & issue: Vol. 11, no. 15
p. 7033

Abstract

Read online

Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based ecosystems. New trends in Big Data technologies have also emerged (e.g., Apache Spark, Apache Flink); they use distributed in-memory processing and promise to deliver higher data processing performance. In this paper, we present a formal interpretation of some PACT transformations implemented in the Apache Flink DataSet API. We use this formalization to provide a mapping to translate a SPARQL query to a Flink program. The mapping was implemented in a prototype used to determine the correctness and performance of the solution. The source code of the project is available in Github under the MIT license.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords