IEEE Access (Jan 2019)

An Effective Minimal Probing Approach With Micro-Cluster for Distance-Based Outlier Detection in Data Streams

  • Mohamed Jaward Bah,
  • Hongzhi Wang,
  • Mohamed Hammad,
  • Furkh Zeshan,
  • Hanan Aljuaid

DOI
https://doi.org/10.1109/ACCESS.2019.2946966
Journal volume & issue
Vol. 7
pp. 154922 – 154934

Abstract

Read online

Outlier detection in data streams is considered a significant task in data mining that targets the discovery of elements in an unprecedented data arrival rate. The fast arrival of data demands fast computation within the shortest period, and with minimal memory usage. Detecting distance-based outliers in such a scenario are more complicated. Existing techniques such as the two best-known methods - Micro-Cluster Outlier Detection (MCOD) and Thresh_LEAP have presented some solutions to these challenges. However, the combination of the strength of both techniques can be a lot more improvement to the individual methods proposed. Therefore, in this paper, we propose a method called Micro-Cluster with Minimal Probing (MCMP), which is a hybrid approach of the combination of the strength of MCOD and Thresh_LEAP. We offer a new distance-based outlier detection technique to minimize the computational cost in detecting distance-based outliers effectively. The proposed MCMP technique is comprised of two approaches. Firstly, we adopt micro-clusters to mitigate the range query search. Then, to deal with the objects outside the micro-clusters, we propose the concept of differentiating between strong and trivial inliers. The proposed method improves the computational speed and memory consumption, while simultaneously maintaining the outlier detection accuracy. Our experiments are conducted on both real-world and synthetic data sets. We varied the window size $(w)$ , neighbor count threshold $(k)$ and distance threshold $(R)$ , and observed that our method outperforms the state-of-the-art methods in both CPU time and memory consumption in the majority of the datasets.

Keywords