Improving Time Series Regression Model Accuracy via Systematic Training Dataset Augmentation and Sampling

Robin Ströbel; Marcus Mau; Alexander Puchta; Jürgen Fleischer

doi:10.3390/make6020049

Machine Learning and Knowledge Extraction (May 2024)

Improving Time Series Regression Model Accuracy via Systematic Training Dataset Augmentation and Sampling

Robin Ströbel,
Marcus Mau,
Alexander Puchta,
Jürgen Fleischer

Affiliations

Robin Ströbel: wbk Institute of Production Science, Karlsruhe Institute of Technology, Kaiserstraße 12, 76131 Karlsruhe, Germany
Marcus Mau: wbk Institute of Production Science, Karlsruhe Institute of Technology, Kaiserstraße 12, 76131 Karlsruhe, Germany
Alexander Puchta: wbk Institute of Production Science, Karlsruhe Institute of Technology, Kaiserstraße 12, 76131 Karlsruhe, Germany
Jürgen Fleischer: wbk Institute of Production Science, Karlsruhe Institute of Technology, Kaiserstraße 12, 76131 Karlsruhe, Germany

DOI: https://doi.org/10.3390/make6020049
Journal volume & issue: Vol. 6, no. 2
pp. 1072 – 1086

Abstract

Read online

This study addresses a significant gap in the field of time series regression modeling by highlighting the central role of data augmentation in improving model accuracy. The primary objective is to present a detailed methodology for systematic sampling of training datasets through data augmentation to improve the accuracy of time series regression models. Therefore, different augmentation techniques are compared to evaluate their impact on model accuracy across different datasets and model architectures. In addition, this research highlights the need for a standardized approach to creating training datasets using multiple augmentation methods. The lack of a clear framework hinders the easy integration of data augmentation into time series regression pipelines. Our systematic methodology promotes model accuracy while providing a robust foundation for practitioners to seamlessly integrate data augmentation into their modeling practices. The effectiveness of our approach is demonstrated using process data from two milling machines. Experiments show that the optimized training dataset improves the generalization ability of machine learning models in 86.67% of the evaluated scenarios. However, the prediction accuracy of models trained on a sufficient dataset remains largely unaffected. Based on these results, sophisticated sampling strategies such as Quadratic Weighting of multiple augmentation approaches may be beneficial.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords