IEEE Access (Jan 2023)

Self-Supervised and Few-Shot Contrastive Learning Frameworks for Text Clustering

  • Haoxiang Shi,
  • Tetsuya Sakai

DOI
https://doi.org/10.1109/ACCESS.2023.3302913
Journal volume & issue
Vol. 11
pp. 84134 – 84143

Abstract

Read online

Contrastive learning is a promising approach to unsupervised learning, as it inherits the advantages of well-studied deep models without a dedicated and complex model design. In this paper, based on bidirectional encoder representations from transformers (BERT) and long-short term memory (LSTM) neural networks, we propose self-supervised contrastive learning (SCL) as well as few-shot contrastive learning (FCL) with unsupervised data augmentation (UDA) for text clustering. BERT-SCL outperforms state-of-the-art unsupervised clustering approaches for short texts and for long texts in terms of several clustering evaluation measures. LSTM-SCL also shows good performance for short text clustering. BERT-FCL achieves performance close to supervised learning, and BERT-FCL with UDA further improves the performance for short texts. LSTM-FCL outperforms the supervised model in terms of several clustering evaluation measures. Our experiment results suggest that both SCL and FCL are effective for text clustering.

Keywords