Jisuanji kexue yu tansuo (Aug 2023)

Semi-supervised Deep Document Clustering Model with Supplemented User Intention

  • LI Jingnan, HUANG Ruizhang, REN Lina

DOI
https://doi.org/10.3778/j.issn.1673-9418.2203064
Journal volume & issue
Vol. 17, no. 8
pp. 1928 – 1937

Abstract

Read online

Traditional document clustering algorithms classify data by measuring the similarity between documents. But they can??t mine users' subjective intention of clustering results according to a small amount of supervision information given by users. With the development of the diversified application scenarios, the clustering results of the same dataset under the guidance of different users?? intentions may not be unique. How to obtain the clustering results following users' intentions is one of the problems in the current research. Besides, there is a small amount of supervision information given by users. How to learn the clustering intention of users to the greatest extent according to a small amount of supervision information is another problem. Therefore, a semi-supervised deep document clustering model with supplemented intention (SDDCS) is proposed. According to the supervision information given by the user, SDDCS constructs an intention matrix to mine the user's intention. The unknown elements in the intention matrix are supplemented according to the matrix factorization and supplement algorithm, so as to learn the users' intention to the greatest extent. The supplementary intention matrix is used to guide the document clustering process, and the user's intention is taken as one of the clustering bases. Finally, the clustering results in line with the user's intention are obtained. Experiments on four public document datasets show that the clustering performance of SDDCS is higher, and its effectiveness is proven.

Keywords