On data efficiency of univariate time series anomaly detection models

Wu Sun; Hui Li; Qingqing Liang; Xiaofeng Zou; Mei Chen; Yanhao Wang

doi:10.1186/s40537-024-00940-7

Journal of Big Data (Jun 2024)

On data efficiency of univariate time series anomaly detection models

Wu Sun,
Hui Li,
Qingqing Liang,
Xiaofeng Zou,
Mei Chen,
Yanhao Wang

Affiliations

Wu Sun: State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University
Hui Li: State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University
Qingqing Liang: State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University
Xiaofeng Zou: State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University
Mei Chen: State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University
Yanhao Wang: School of Data Science and Engineering, East China Normal University

DOI: https://doi.org/10.1186/s40537-024-00940-7
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 31

Abstract

Read online

Abstract In machine learning (ML) problems, it is widely believed that more training samples lead to improved predictive accuracy but incur higher computational costs. Consequently, achieving better data efficiency, that is, the trade-off between the size of the training set and the accuracy of the output model, becomes a key problem in ML applications. In this research, we systematically investigate the data efficiency of Univariate Time Series Anomaly Detection (UTS-AD) models. We first experimentally examine the performance of nine popular UTS-AD algorithms as a function of the training sample size on several benchmark datasets. Our findings confirm that most algorithms become more accurate when more training samples are used, whereas the marginal gain for adding more samples gradually decreases. Based on the above observations, we propose a novel framework called FastUTS-AD that achieves improved data efficiency and reduced computational overhead compared to existing UTS-AD models with little loss of accuracy. Specifically, FastUTS-AD is compatible with different UTS-AD models, utilizing a sampling- and scaling law-based heuristic method to automatically determine the number of training samples a UTS-AD model needs to achieve predictive performance close to that when all samples in the training set are used. Comprehensive experimental results show that, for the nine popular UTS-AD algorithms tested, FastUTS-AD reduces the number of training samples and the training time by 91.09–91.49% and 93.49–93.82% on average without significant decreases in accuracy.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords