Управленческое консультирование (Jun 2020)

Hierarchical clustering methods in a task to find abnormal observations based on groups with broken symmetry

  • A. N. Kislyakov,
  • S. V. Polyakov

DOI
https://doi.org/10.22394/1726-1139-2020-5-116-127
Journal volume & issue
Vol. 0, no. 5
pp. 116 – 127

Abstract

Read online

The work is aimed at solving the actual problem of identification and interpretation of anomalous observations in the study of socio-economic processes. The proposed method is based on the use of a cluster approach to detecting anomalous observations. Clustering is performed using hierarchical methods, which are a set of data ordering algorithms aimed at creating dendrograms consisting of groups of observed points. In the case of mixed data consisting of numeric and categorical variables, it is proposed to use the Gower distance as a metric for distances between elements. Clustering quality is evaluated based on the sum of squares of metric distances between objects within the cluster and the average width of the silhouette. These indicators allow you to select the optimal number of clusters and evaluate the quality of the split results. The dendrogram can be used to study the symmetry groups of cluster systems and the causes of symmetry breaking. Anomaly detection is performed by analyzing the results of hierarchical clustering and identifying branches of the dendrogram that are located at the initial levels of tree construction and do not have branches. The implemented method makes it possible to more accurately interpret the results of clustering with respect to determining errors of the first and second kind in the form of anomalous observations in the data set. Using the described method, it is possible to effectively investigate socio-economic systems and manage their development.

Keywords