PeerJ Computer Science (Oct 2023)

ELI: an IoT-aware big data pipeline with data curation and data quality

  • Francisco José de Haro-Olmo,
  • Alvaro Valencia-Parra,
  • Ángel Jesús Varela-Vaca,
  • José Antonio Álvarez-Bermejo,
  • María Teresa Gómez-López

DOI
https://doi.org/10.7717/peerj-cs.1605
Journal volume & issue
Vol. 9
p. e1605

Abstract

Read online Read online

The complexity of analysing data from IoT sensors requires the use of Big Data technologies, posing challenges such as data curation and data quality assessment. Not facing both aspects potentially can lead to erroneous decision-making (i.e., processing incorrectly treated data, introducing errors into processes, causing damage or increasing costs). This article presents ELI, an IoT-based Big Data pipeline for developing a data curation process and assessing the usability of data collected by IoT sensors in both offline and online scenarios. We propose the use of a pipeline that integrates data transformation and integration tools and a customisable decision model based on the Decision Model and Notation (DMN) to evaluate the data quality. Our study emphasises the importance of data curation and quality to integrate IoT information by identifying and discarding low-quality data that obstruct meaningful insights and introduce errors in decision making. We evaluated our approach in a smart farm scenario using agricultural humidity and temperature data collected from various types of sensors. Moreover, the proposed model exhibited consistent results in offline and online (stream data) scenarios. In addition, a performance evaluation has been developed, demonstrating its effectiveness. In summary, this article contributes to the development of a usable and effective IoT-based Big Data pipeline with data curation capabilities and assessing data usability in both online and offline scenarios. Additionally, it introduces customisable decision models for measuring data quality across multiple dimensions.

Keywords