Complexity (Jan 2020)

Effect Improved for High-Dimensional and Unbalanced Data Anomaly Detection Model Based on KNN-SMOTE-LSTM

  • Fuguang Bao,
  • Yongqiang Wu,
  • Zhaogang Li,
  • Yongzhao Li,
  • Lili Liu,
  • Guanyu Chen

DOI
https://doi.org/10.1155/2020/9084704
Journal volume & issue
Vol. 2020

Abstract

Read online

High-dimensional and unbalanced data anomaly detection is common. Effective anomaly detection is essential for problem or disaster early warning and maintaining system reliability. A significant research issue related to the data analysis of the sensor is the detection of anomalies. The anomaly detection is essentially an unbalanced sequence binary classification. The data of this type contains characteristics of large scale, high complex computation, unbalanced data distribution, and sequence relationship among data. This paper uses long short-term memory networks (LSTMs) combined with historical sequence data; also, it integrates the synthetic minority oversampling technique (SMOTE) algorithm and K-nearest neighbors (kNN), and it designs and constructs an anomaly detection network model based on kNN-SMOTE-LSTM in accordance with the data characteristic of being unbalanced. This model can continuously filter out and securely generate samples to improve the performance of the model through kNN discriminant classifier and avoid the blindness and limitations of the SMOTE algorithm in generating new samples. The experiments demonstrated that the structured kNN-SMOTE-LSTM model can significantly improve the performance of the unbalanced sequence binary classification.