IEEE Access (Jan 2020)

Model-Driven Development of Web APIs to Access Integrated Tabular Open Data

  • Cesar Gonzalez-Mora,
  • David Tomas,
  • Irene Garrigos,
  • Jose Jacobo Zubcoff,
  • Jose-Norberto Mazon

DOI
https://doi.org/10.1109/ACCESS.2020.3036462
Journal volume & issue
Vol. 8
pp. 202669 – 202686

Abstract

Read online

More and more governments around the world are publishing tabular open data, mainly in formats such as CSV or XLS(X). These datasets are mostly individually published, i.e. each publisher exposes its data on the Web without considering potential relationships with other datasets (from its own or from other publishers). As a result, reusing several open datasets together is not a trivial task, thus requiring mechanisms that allow data consumers (as software developers or data scientists) to integrate and access tabular open data published on the Web. In this paper, we propose a model-driven approach to automatically generate Web APIs that homogeneously access multiple integrated tabular open datasets. This work focuses on data that can be integrated by means of join and union operations. As a first step, our approach detects unionable and joinable tabular open data by using a table similarity measure based on word embeddings. Then, an APIfication process is developed to create APIs that access the previously integrated datasets through a single endpoint. A running example is presented throughout the article, as well as a set of experiments for performance evaluation to show the feasibility of our approach.

Keywords