IEEE Access (Jan 2018)
Higher-Fidelity Frugal and Accurate Quantile Estimation Using a Novel Incremental <italic>Discretized</italic> Paradigm
Abstract
Traditional pattern classification works with the moments of the distributions of the features and involves the estimation of the means and variances. As opposed to this, more recently, research has indicated the power of using the quantiles of the distributions because they are more robust and applicable for nonparametric methods. The estimation of the quantiles is even more pertinent when one is mining data streams. However, the complexity of quantile estimation is much higher than the corresponding estimation of the mean and variance, and this increased complexity is more relevant as the size of the data increases. Clearly, in the context of infinite data streams, a computational and space complexity that is linear in the size of the data is definitely not affordable. In order to alleviate the problem complexity, recently, a very limited number of studies have devised incremental quantile estimators [1], [2]. Estimators within this class resort to updating the quantile estimates based on the most recent observation(s), and this yields updating schemes with a very small computational footprint-a constant-time (i.e., O(1)) complexity. In this paper, we pursue this research direction and present an estimator that we refer to as a higher-fidelity frugal [1] quantile estimator. First, it guarantees a substantial advancement of the family of Frugal estimators introduced in [1]. The highlight of the present scheme is that it works in the discretized space, and it is thus a pioneering algorithm within the theory of discretized algorithms.1 The convergence results that we have proven are based on the theory of stochastic point location [3], which we advocate as a new tool for solving a large class of online estimation problems. Extensive simulation results show that our estimator outperforms the original Frugal algorithm in terms of both speed and accuracy.
Keywords