Concept and benchmark results for Big Data energy forecasting based on Apache Spark

Jorge Ángel González Ordiano; Andreas Bartschat; Nicole Ludwig; Eric Braun; Simon Waczowicz; Nicolas Renkamp; Nico Peter; Clemens Düpmeier; Ralf Mikut; Veit Hagenmeyer

doi:10.1186/s40537-018-0119-6

Journal of Big Data (Mar 2018)

Concept and benchmark results for Big Data energy forecasting based on Apache Spark

Jorge Ángel González Ordiano,
Andreas Bartschat,
Nicole Ludwig,
Eric Braun,
Simon Waczowicz,
Nicolas Renkamp,
Nico Peter,
Clemens Düpmeier,
Ralf Mikut,
Veit Hagenmeyer

Affiliations

Jorge Ángel González Ordiano: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Andreas Bartschat: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Nicole Ludwig: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Eric Braun: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Simon Waczowicz: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Nicolas Renkamp: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Nico Peter: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Clemens Düpmeier: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Ralf Mikut: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology
Veit Hagenmeyer: Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology

DOI: https://doi.org/10.1186/s40537-018-0119-6
Journal volume & issue: Vol. 5, no. 1
pp. 1 – 11

Abstract

Read online

Abstract The present article describes a concept for the creation and application of energy forecasting models in a distributed environment. Additionally, a benchmark comparing the time required for the training and application of data-driven forecasting models on a single computer and a computing cluster is presented. This comparison is based on a simulated dataset and both R and Apache Spark are used. Furthermore, the obtained results show certain points in which the utilization of distributed computing based on Spark may be advantageous.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords