IEEE Access (Jan 2020)

An Active Learning Algorithm Based on Shannon Entropy for Constraint-Based Clustering

  • Duo Wen Chen,
  • Ying Hua Jin

DOI
https://doi.org/10.1109/ACCESS.2020.3025036
Journal volume & issue
Vol. 8
pp. 171447 – 171456

Abstract

Read online

Pairwise constraints could enhance clustering performance in constraint-based clustering problems, especially when these pairwise constraints are informative. In this paper, a novel active learning pairwise constraint formulation algorithm would be constructed with aim to formulate informative pairwise constraints efficiently and economically. This algorithm consists of three phases: Selecting, Exploring and Consolidating. In Selecting phase, some type of unsupervised clustering algorithm is used to obtain an informative data set in terms of Shannon entropy. In Exploring phase, some type of farthest-first strategy is used to construct a series of query with aim to construct clustering skeleton set structure and informative pairwise constraints are also collected meanwhile based on the informative data set. If the number of skeleton sets equals the number of clusters, the new algorithm gets into third phase Consolidating; otherwise, it would finish. In Consolidating phase, non-skeleton points included in the informative data set are used to construct a series of query with skeleton set representative points constructed in Exploring phase. And some type of priority principle is proposed to help collect more must-link pairwise constraints. Treat the well-known MPCK-means (metric pairwise constrained K-means) as the underlying constraint-based semi-supervised clustering algorithm and data experiment comparison between this new algorithm and its counterparts would be done. Experiment outcome shows that significant improvement of this new algorithm.

Keywords