Machine Learning with Applications (Jun 2022)

An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets

  • Thejas G.S.,
  • Yashas Hariprasad,
  • S.S. Iyengar,
  • N.R. Sunitha,
  • Prajwal Badrinath,
  • Shasank Chennupati

Journal volume & issue
Vol. 8
p. 100267

Abstract

Read online

More often than not, data collected in real-time tends to be imbalanced i.e., the samples belonging to a particular class are significantly more than the others. This degrades the performance of the predictor. One of the most notable algorithms to handle such an imbalance in the dataset by fabricating synthetic data, is the “Synthetic Minority Oversampling Technique (SMOTE)”. However, data imbalance is not solely responsible for the poor performance of the classifier. Certain research works have demonstrated that noisy samples can have a significant role in misclassifying the dataset. Also, handling large data is computationally expensive. Hence, data reduction is imperative. In this work, we put forth a novel extension of SMOTE by integrating it with the Kalman filter. The proposed method, Kalman-SMOTE (KSMOTE), filters out the noisy samples in the final dataset after SMOTE, which includes both the raw data and the synthetically generated samples, thereby reducing the size of the dataset. Our model is validated with a wide range of datasets. An experimental analysis of the results shows that our model outperforms the presently available techniques.

Keywords