IEEE Access (Jan 2019)

Stratified Feature Sampling for Semi-Supervised Ensemble Clustering

  • Jialin Tian,
  • Yazhou Ren,
  • Xiang Cheng

DOI
https://doi.org/10.1109/ACCESS.2019.2939581
Journal volume & issue
Vol. 7
pp. 128669 – 128675

Abstract

Read online

Ensemble Clustering (EC), which seeks to generate a consensus clustering by integrating multiple base clusterings, has attracted increasing attentions. However, traditional EC methods typically have three main limitations: (1) High dimensional data present a huge challenge to ensemble clustering methods. (2) Most EC algorithms can not use prior information, e.g., pairwise constraints, to enhance the clustering performance. (3) Even in existing semi-supervised ensemble clustering methods, prior information is not sufficiently used, e.g., only used in generating base clusterings. To alleviate these problems, we propose Stratified Feature Sampling for Semi-Supervised Ensemble Clustering (SFS3EC). Firstly, we develop a novel stratified feature sampling method, which can cope with high dimensional data, guarantee the diversity of base clusterings, and reduce the risk that some features are not selected at the same time. Secondly, semi-supervised clustering, i.e., constraint propagation, is applied to obtain base clusterings. Finally, we propose to utilize prior information in both the base clustering generating process and the consensus process, which guarantees that prior information is sufficiently used. We conduct a series of experiments on ten real-world data sets to demonstrate the effectiveness of the proposed model.

Keywords