Complex & Intelligent Systems (Dec 2022)

Adaptive encoding-based evolutionary approach for Chinese document clustering

  • Jun-Xian Chen,
  • Yue-Jiao Gong,
  • Wei-Neng Chen,
  • Xiaolin Xiao

DOI
https://doi.org/10.1007/s40747-022-00934-z
Journal volume & issue
Vol. 9, no. 3
pp. 3385 – 3398

Abstract

Read online

Abstract Document clustering has long been an important research direction in intelligent system. When being applied to process Chinese documents, new challenges were posted since it is infeasible to directly split the Chinese documents using the whitespace character. Moreover, many Chinese document clustering algorithms require prior knowledge of the cluster number, which is impractical to know in real-world applications. Considering these problems, we propose a general Chinese document clustering framework, where the main clustering task is fulfilled with an adaptive encoding-based evolutionary approach. Specifically, the adaptive encoding scheme is proposed to automatically learn the cluster number, and novel crossover and mutation operators are designed to fit this scheme. In addition, a single step of K-means is incorporated to conduct a joint global and local search, enhancing the overall exploitation ability. The experiments on benchmark datasets demonstrate the superiority of the proposed method in both the efficiency and the clustering precision.

Keywords