Journal of King Saud University: Computer and Information Sciences (Jan 2023)

Intention-guided deep semi-supervised document clustering via metric learning

  • Li Jingnan,
  • Lin Chuan,
  • Huang Ruizhang,
  • Qin Yongbin,
  • Chen Yanping

Journal volume & issue
Vol. 35, no. 1
pp. 416 – 425

Abstract

Read online

The intention expresses the user’s preference for document structure division. Intention-guided document structure division is an important task in the field of text mining. To achieve this goal, deep semi-supervised document clustering provides a promising solution to personalized document clustering. However, traditional deep semi-supervised clustering models suffer from the problem of the limited number of constraints which is insufficient for intention-guided document clustering. Moreover, documents normally have various emphases on their representations to reflect different structural opinions. In this paper, we proposed an intention-guided deep semi-supervised document clustering model, namely IGSC, to divide document structure based on a small amount of user-provided supervised information. IGSC designs a deep metric learning network to solve the above problems. The deep metric learner explores the user’s global intention and outputs an intention matrix. The intention is explored from the small amount user provided pairwise constraints and is used to guide the representation learning. Moreover, IGSC uses the intention matrix to guide the clustering process, to get the clustering results that best meet the user’s intention. This paper compares IGSC with a number of document clustering models on four real-world text datasets, namely Reu-10k, BBC, ACM, and Abstract. The results show that IGSC evidently improves the clustering performance and outperforms the best result of benchmark models with 7% on average. The comparison with other models and the visualization results can demonstrate that IGSC is effective.

Keywords