Jiàoyù zīliào yǔ túshūguǎn xué (Jul 2022)
Estimation of Topic Similarity and Its Application to Measuring Stability of Topic Modeling
Abstract
Topic modeling stability is a measurement of the extent to which models produced by the same modeling approach for the same corpus and with the same initial conditions have similar topics. Since the method used for calculating similarity between topics is considered the basis for measuring topic modeling stability and topic alignment is a key step in the measurement,the present study first calculated the proportion of identical paired topics among the optimal combinations of paired topics generated using different topic similarity calculation methods, and then observed the distribution of similarity scores of paired topics for each method. Finally, this study performed an analysis of the effects of the number of topics on topic modeling stability. The topic modeling method used in this study is commonly used LDA topic modeling, and the corpus used to establish topic models including about 30,000 posts was collected from the PTT Bulletin Board System (BBS) Book message board. The results indicated that there is a high proportion of identical paired topics among the different methods of measuring similarity,although the similarity scores of paired topics for each method had different distributions due to the different kinds and amounts of information of word distribution in each topic they used. The results also revealed that with the increase of the number of topics, the stability noticeably decreased.
Keywords