IEEE Access (Jan 2024)
Posts Quality Prediction for StackOverflow Website
Abstract
The development of the computer industry is closely linked to various question-and-answer websites, whose primary function is to discover and solve problems encountered by users. This paper focuses on the quality prediction of question posts on the StackOverflow(SO) website, which can essentially be considered a text classification problem. Given the large number of users, manual moderation becomes inadequate when faced with a vast quantity of user questions. Reducing the occurrence of low-quality questions can effectively alleviate the operational pressure on the website. We preprocess and vectorize the posts to obtain vector representations of the training and testing sets. After training 5 different machine learning models, including decision trees, random forests, naive Bayes, support vector machines, logistic regression, and 2 deep learning models, Bi-LSTM and BERT, these models are compared through experiments by adjusting the values of different parameters. The results indicate that different parameters have a certain impact on the experimental results, and there are significant differences in the quality prediction performance of different models. The lowest accuracy rate only reaches 54%, while the highest accuracy is 92%. The comparison shows that quality assessment based on the attention mechanism model is effective and can be used to predict post-quality.
Keywords