IEEE Access (Jan 2024)

Enhancing Web Text Clustering Accuracy and Efficiency With a Maximum Entropy Function Model: Overcoming High-Dimensional and Directional Challenges

  • Xumin Zhao,
  • Guojie Xie,
  • Yi Luo,
  • Fenghua Liu,
  • Hongpeng Bai

DOI
https://doi.org/10.1109/ACCESS.2024.3374770
Journal volume & issue
Vol. 12
pp. 42961 – 42973

Abstract

Read online

With the rapid development of large models such as Chatgpt, text clustering has become an important research topic in data mining. However, traditional clustering algorithms face challenges in terms of text clustering due to the high dimensionality and directionality of text data; in particular, the research on web text mining is insufficient, so the accuracy and efficiency of clustering algorithms need to be improved. Aiming at the above challenges, this paper proposes a maximum entropy function model and applies it to web text clustering to overcome these challenges and achieve better clustering results. Unlike the traditional clustering algorithm, this algorithm avoids the local minimum and realizes the global minimum. This study will help strengthen web text mining and provide valuable insights for future research. In summary, this paper proposes a novel text clustering method, MEMC, which uses the maximum entropy function model to overcome the challenges of high-dimensional and directional features. Compared with the popular algorithms in the international standard datasets, the method is 15% higher than the current popular k-means algorithm in purity and 6% higher than the AP algorithm.

Keywords