Energy and AI (Jan 2024)

Generation of meaningful synthetic sensor data — Evaluated with a reliable transferability methodology

  • Michael Meiser,
  • Benjamin Duppe,
  • Ingo Zinnikus

Journal volume & issue
Vol. 15
p. 100308

Abstract

Read online

As households are equipped with smart meters, supervised Machine Learning (ML) models and especially Non-Intrusive Load Monitoring (NILM) disaggregation algorithms are becoming increasingly important. To be robust, these models require a large amount of data, which is difficult to collect. Consequently, the generation of meaningful synthetic data is becoming more relevant. We use a simulation framework to generate multiple datasets using different techniques and evaluate their quality statistically by measuring the performance of NILM models for transferability. We demonstrate that the method of data generation is crucial to train ML models in a meaningful way. The experiments conducted reveal that adding noise to the synthetic smart meter data is essential to train robust NILM models for transferability. The best results are obtained when this noise is derived from unknown appliances for which no ground truth data is available. Since we observed that NILM models can provide unstable results, we develop a reliable evaluation methodology, based on Cochran’s sample size. Finally, we compare the quality of the generated synthetic data with real data and observe that multiple NILM models trained on synthetic data perform significantly better than those trained on real data.

Keywords