IEEE Access (Jan 2022)

Introducing Polyglot-Based Data-Flow Awareness to Time-Series Data Stores

  • Carlos Garcia Calatrava,
  • Yolanda Becerra,
  • Fernando M. Cucchietti

DOI
https://doi.org/10.1109/ACCESS.2022.3187405
Journal volume & issue
Vol. 10
pp. 69398 – 69411

Abstract

Read online

The rising interest in extracting value from data has led to a broad proliferation of monitoring infrastructures, most notably composed by sensors, intended to collect this new oil. Thus, gathering data has become fundamental for a great number of applications, such as predictive maintenance techniques or anomaly detection algorithms. However, before data can be refined into insights and knowledge, it has to be efficiently stored and prepared for its later retrieval. As a consequence of this sensor and IoT boom, Time-Series databases (TSDB), designed to manage sensor data, became the fastest-growing database category since 2019. Here we propose a holistic approach intended to improve TSDB’s performance and efficiency. More precisely, we introduce and evaluate a novel polyglot-based approximation, aimed to tailor the data store, not only to time-series data–as it is done conventionally– but also to the data flow itself: From its ingestion, until its retrieval. In order to evaluate the approach, we materialize it in an alternative implementation of NagareDB, a resource-efficient time-series database, based on MongoDB, in turn, the most popular NoSQL storage solution. After implementing our approach into the database, we observe a global speed up, solving queries up to 12 times faster than MongoDB’s recently launched Time-series capability, as well as generally outperforming InfluxDB, the most popular time-series database. Our polyglot-based data-flow aware solution can ingest data more than two times faster than MongoDB, InfluxDB, and NagareDB’s original implementation, while using the same disk space as InfluxDB, and half of the requested by MongoDB.

Keywords