Generation of meaningful synthetic sensor data — Evaluated with a reliable transferability methodology

Michael Meiser; Benjamin Duppe; Ingo Zinnikus

Energy and AI (Jan 2024)

Generation of meaningful synthetic sensor data — Evaluated with a reliable transferability methodology

Michael Meiser,
Benjamin Duppe,
Ingo Zinnikus

Affiliations

Michael Meiser: Corresponding author.; Deutsches Forschungszentrum für künstliche Intelligenz (DFKI), Stuhlsatzenhausweg 3, Saarbrücken, 66123, Saarland, Germany
Benjamin Duppe: Deutsches Forschungszentrum für künstliche Intelligenz (DFKI), Stuhlsatzenhausweg 3, Saarbrücken, 66123, Saarland, Germany
Ingo Zinnikus: Deutsches Forschungszentrum für künstliche Intelligenz (DFKI), Stuhlsatzenhausweg 3, Saarbrücken, 66123, Saarland, Germany

Journal volume & issue: Vol. 15
p. 100308

Abstract

Read online

As households are equipped with smart meters, supervised Machine Learning (ML) models and especially Non-Intrusive Load Monitoring (NILM) disaggregation algorithms are becoming increasingly important. To be robust, these models require a large amount of data, which is difficult to collect. Consequently, the generation of meaningful synthetic data is becoming more relevant. We use a simulation framework to generate multiple datasets using different techniques and evaluate their quality statistically by measuring the performance of NILM models for transferability. We demonstrate that the method of data generation is crucial to train ML models in a meaningful way. The experiments conducted reveal that adding noise to the synthetic smart meter data is essential to train robust NILM models for transferability. The best results are obtained when this noise is derived from unknown appliances for which no ground truth data is available. Since we observed that NILM models can provide unstable results, we develop a reliable evaluation methodology, based on Cochran’s sample size. Finally, we compare the quality of the generated synthetic data with real data and observe that multiple NILM models trained on synthetic data perform significantly better than those trained on real data.

Published in Energy and AI

ISSN: 2666-5468 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://www.journals.elsevier.com/energy-and-ai

About the journal

Abstract

Keywords