Applied Sciences (Jul 2024)
Metrics for Evaluating Synthetic Time-Series Data of Battery
Abstract
The advancements in artificial intelligence have encouraged the application of deep learning in various fields. However, the accuracy of deep learning algorithms is influenced by the quality of the dataset used. Therefore, a high-quality dataset is critical for deep learning. Data augmentation algorithms can generate large, high-quality datasets. The dataset quality is mainly assessed through qualitative and quantitative evaluations. However, conventional qualitative evaluation methods lack the objective and quantitative parameters necessary for battery synthetic datasets. Therefore, this study proposes the application of the rate of change in linear regression correlation coefficients, Dunn index, and silhouette coefficient as clustering indices for quantitatively evaluating the quality of synthetic time-series datasets of batteries. To verify the reliability of the proposed method, we first applied the TimeGAN algorithm to an open-source battery dataset, generated a synthetic battery dataset, and then compared its similarity to the original dataset using the proposed evaluation method. The silhouette coefficient was confirmed as the most reliable index. Furthermore, the similarity of datasets increased as the silhouette index decreased from 0.1053 to 0.0073 based on the number of learning iterations. The results demonstrate that the insufficient quality of datasets used for deep learning can be overcome and supplemented. Furthermore, data similarity can be efficiently evaluated regardless of the learning environment. In conclusion, we present a new synthetic time-series dataset evaluation method that is more reliable than the conventional representative evaluation method (the training loss rate).
Keywords