IEEE Access (Jan 2022)

Unsupervised Outlier Detection for Mixed-Valued Dataset Based on the Adaptive k-Nearest Neighbor Global Network

  • Yu Wang,
  • Xuejing Cao,
  • Yupeng Li

DOI
https://doi.org/10.1109/ACCESS.2022.3161481
Journal volume & issue
Vol. 10
pp. 32093 – 32103

Abstract

Read online

Outlier detection aims to reveal data patterns different from existing data. Benefit from its good robustness and interpretability, the outlier detection method for numerical dataset based on $k$ -Nearest Neighbor ( $k$ -NN) network has attracted much attention in recent years. However, the datasets produced in many practical contexts tend to contain both numerical and categorical attributes, that are, the datasets with mixed-valued attributes (DMAs). And, the selection of $k$ is also an issue that is worthy of attention for unlabeled datasets. Therefore, an unsupervised outlier detection method for DMA based on an adaptive $k$ -NN global network is proposed. First, an adaptive search algorithm for the appropriate value of $k$ considering the distribution characteristics of datasets is introduced. Next, the distance between mixed-valued data objects is measured based on the Heterogeneous Euclidean-Overlap Metric, and the $k$ -NN of a data object is obtained. Then, an adaptive $k$ -NN global network is constructed based on the neighborhood relationships between data objects, and a customized random walk process is executed on it to detect outliers by using the transition probability to limit behaviors of the random walker. Finally, the effectiveness, accuracy, and applicability of the proposed method are demonstrated by a detailed experiment.

Keywords