Sensors (Jul 2022)

Distance- and Momentum-Based Symbolic Aggregate Approximation for Highly Imbalanced Classification

  • Dong-Hyuk Yang,
  • Yong-Shin Kang

DOI
https://doi.org/10.3390/s22145095
Journal volume & issue
Vol. 22, no. 14
p. 5095

Abstract

Read online

Time-series representation is the most important task in time-series analysis. One of the most widely employed time-series representation method is symbolic aggregate approximation (SAX), which converts the results from piecewise aggregate approximation to a symbol sequence. SAX is a simple and effective method; however, it only focuses on the mean value of each segment in the time-series. Here, we propose a novel time-series representation method—distance- and momentum-based symbolic aggregate approximation (DM-SAX)—that can secure time-series distributions by calculating the perpendicular distance from the time-axis to each data point and consider the time-series trend by adding a momentum factor reflecting the direction of previous data points. Experimental results for 29 highly imbalanced classification problems on the UCR datasets revealed that DM-SAX affords the optimal area under the curve (AUC) among competing time-series representation methods (SAX, extreme-SAX, overlap-SAX, and distance-based SAX). We statistically verified that performance improvements resulted in significant differences in the rankings. In addition, DM-SAX yielded the optimal AUC for real-world wire cutting and crimping process dataset. Meaningful data points such as outliers could be identified in a time-series outlier detection framework via the proposed method.

Keywords