Proceedings of the XXth Conference of Open Innovations Association FRUCT (Apr 2020)
Stream Data Preprocessing: Outlier Detection Based on the Chebyshev Inequality With Applications
Abstract
A novel method of outlier detection for preprocessing stream data in the conditions of uncertainty when the data measurement means and their standard errors as the only available summary information about the initial data statistics is proposed. As neither the initial data samples nor their sample sizes are known, the classical methods of outlier detection including nonparametric methods of statistics cannot be applied in this case. The principal idea of the proposed approach to outlier detection is based on the use of the classical Gauss-Chebyshev type probability inequalities---the corresponding confidence intervals constructed on these inequalities allow to set up the problems of hypotheses testing similar to the classical settings as the problems of minimizing the upper bound of the Bayesian risk and maximizing the lower bound of the test power in the Neyman-Pearson sense. The results of the processing of the real-life data (Lunar Laser Ranging data) and the model data manifest unexpectedly good outlier detection performance.
Keywords